<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en"><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://www.alexbevi.com/feed.xml" rel="self" type="application/atom+xml" /><link href="https://www.alexbevi.com/" rel="alternate" type="text/html" hreflang="en" /><updated>2026-04-29T09:46:11-04:00</updated><id>https://www.alexbevi.com/feed.xml</id><title type="html">ALEX BEVILACQUA</title><subtitle>Programming, MongoDB, Ruby and anything else I find interesting. Sometimes dabble in RPG/JRPG/Adventure game reviews.</subtitle><entry><title type="html">Start With Context: Building the Retrieval Core for Agentic Apps</title><link href="https://www.alexbevi.com/blog/2026/04/29/start-with-context-building-the-retrieval-core-for-agentic-apps/" rel="alternate" type="text/html" title="Start With Context: Building the Retrieval Core for Agentic Apps" /><published>2026-04-29T06:47:47-04:00</published><updated>2026-04-29T09:45:29-04:00</updated><id>https://www.alexbevi.com/blog/2026/04/29/start-with-context-building-the-retrieval-core-for-agentic-apps</id><content type="html" xml:base="https://www.alexbevi.com/blog/2026/04/29/start-with-context-building-the-retrieval-core-for-agentic-apps/"><![CDATA[<p><em>Before you add planners, crews, or graph-shaped orchestration, build the part that decides what the model should actually see. In this first post, we’ll start an enterprise support copilot and give it the one capability every future agent depends on: retrieval that doesn’t fall apart in production.</em></p>

<p>In a recent post I made the case that MongoDB can serve as <a href="https://alexbevi.com/blog/2026/04/15/mongodb-as-the-brain-of-modern-ai-applications/">the “brain” of a modern AI application</a> by combining durable state, retrieval, and application data in one place. That framing still holds, but brains are only useful if they can recall the right thing at the right time. I wanted to dig into agentic application development in more detail in a series of posts, so for the first real entry in this series, I want to start one layer below “agents” and one layer above raw storage: the context layer.</p>

<p>That might sound slightly less glamorous than “multi-agent orchestration,” which is exactly why it matters. Most enterprise AI systems do not fail because they lack a clever planner. They fail because the model sees the wrong document, too much irrelevant text, or none of the operational data that actually matters.</p>

<p>To make this concrete, the application thread for this series will be an <strong>enterprise support escalation copilot</strong> for a B2B SaaS team. By the end of the series, it should be able to answer questions about incidents, remember previous escalations, pull account context, and coordinate specialized agents when needed. Today, though, we’re giving it its first useful skill: finding the right context for the job.</p>

<p>Think about the kind of question a real support engineer asks:</p>

<blockquote>
  <p>“Acme’s enterprise tenant started seeing <code class="language-plaintext highlighter-rouge">INV-4421</code> after upgrading to <code class="language-plaintext highlighter-rouge">3.8</code>. Did we see this before, is there a known workaround, and does it affect EU clusters only?”</p>
</blockquote>

<p>That is not a pure semantic search problem. It is part natural language, part exact identifier lookup, part metadata filtering, and part ranking problem. Error codes matter. Version numbers matter. Tenant boundaries matter. Timing matters. That’s why this is such a good place to start - and to solve this problem we’ll dig in with <a href="https://www.mongodb.com/products/platform/atlas-database">MongoDB</a> and <a href="https://www.mongodb.com/docs/voyageai/">Voyage AI</a>.</p>

<p><a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-overview">MongoDB Vector Search</a> is built to search vector data alongside the rest of your operational data, supports filtering on other fields in the collection, and can be combined with full-text search for hybrid retrieval. MongoDB’s hybrid search documentation explicitly describes combining semantic and full-text search results with Reciprocal Rank Fusion, which is exactly what you want when a query mixes fuzzy intent with exact strings like issue IDs, SKUs, or feature flags.</p>

<p>On the retrieval-model side, Voyage provides high-accuracy embedding and reranking models, including newer capabilities like contextualized chunk embeddings, multimodal embeddings, and rerankers designed to refine the top candidate set after initial retrieval. MongoDB Atlas now also exposes Voyage models through its Embedding and Reranking API, currently in preview, which means you can either call Voyage models directly or keep retrieval models, vector search, and operational data closer together under Atlas.</p>

<p>So what does the retrieval core for our support copilot actually do?</p>

<p>First, it stores source material in MongoDB: runbooks, release notes, KB articles, previous incident reviews, ticket summaries, and whatever structured account data the support flow needs. Then it chunks the long-form content, embeds it with Voyage, and stores the vectors with the source text and metadata. At query time, it narrows scope using metadata like tenant, product, region, or severity; retrieves candidates semantically and lexically; reranks the best matches; and only then hands a compact, relevant context window to the LLM. In other words: don’t ask the model to be psychic when the database can be specific.</p>

<p>There are a lot of AI frameworks right now, and they absolutely do not all feel the same. But this is the first important pattern in the series: <strong>the framework should shape the developer experience, not force you to redesign the data layer every six months</strong>. The retrieval architecture is the stable part. MongoDB and Voyage AI are the stable parts. LangChain, LlamaIndex, Haystack, LangGraph, CrewAI, or whatever comes next should be able to sit on top of that foundation.</p>

<h2 id="a-framework-agnostic-mental-model">A framework-agnostic mental model</h2>

<p>Before jumping into code, here is the mental model I’d keep fixed no matter which framework you prefer:</p>

<ol>
  <li>Put source documents and operational records in MongoDB.</li>
  <li>Generate embeddings with Voyage.</li>
  <li>Index vector fields and filter fields in MongoDB.</li>
  <li>Use semantic retrieval for meaning.</li>
  <li>Use full-text retrieval for exact strings.</li>
  <li>Rerank the candidate set before generation.</li>
  <li>Return only the context the model actually needs.</li>
</ol>

<p>That shape maps cleanly to both MongoDB Vector Search and Voyage’s model stack. MongoDB handles vector indexes, full-text search, filterable metadata, and live application data; Voyage handles embeddings and reranking; the framework becomes the control surface.</p>

<h2 id="approach-1-langchain-for-the-shortest-path-from-data-to-grounded-answers">Approach 1: LangChain for the shortest path from data to grounded answers</h2>

<p>If the goal is to get a retrieval-backed application running quickly, LangChain remains a very practical starting point. <a href="https://www.mongodb.com/docs/atlas/ai-integrations/langchain/">MongoDB’s LangChain integration</a> supports vector search, full-text search, and a hybrid retriever that combines both with Reciprocal Rank Fusion. It also supports pre-filtering with MQL expressions, which matters immediately for tenant scoping and product boundaries.</p>

<p>An illustrative version for our support copilot looks like this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
</pre></td><td class="rouge-code"><pre><span class="kn">import</span> <span class="n">os</span>

<span class="kn">from</span> <span class="n">langchain_voyageai</span> <span class="kn">import</span> <span class="n">VoyageAIEmbeddings</span>
<span class="kn">from</span> <span class="n">langchain_mongodb.vectorstores</span> <span class="kn">import</span> <span class="n">MongoDBAtlasVectorSearch</span>
<span class="kn">from</span> <span class="n">langchain_mongodb.retrievers.hybrid_search</span> <span class="kn">import</span> <span class="n">MongoDBAtlasHybridSearchRetriever</span>

<span class="n">embeddings</span> <span class="o">=</span> <span class="nc">VoyageAIEmbeddings</span><span class="p">(</span>
    <span class="n">api_key</span><span class="o">=</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="sh">"</span><span class="s">VOYAGE_API_KEY</span><span class="sh">"</span><span class="p">],</span>
    <span class="n">model</span><span class="o">=</span><span class="sh">"</span><span class="s">voyage-4</span><span class="sh">"</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">vector_store</span> <span class="o">=</span> <span class="n">MongoDBAtlasVectorSearch</span><span class="p">.</span><span class="nf">from_connection_string</span><span class="p">(</span>
    <span class="n">connection_string</span><span class="o">=</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="sh">"</span><span class="s">MONGODB_URI</span><span class="sh">"</span><span class="p">],</span>
    <span class="n">namespace</span><span class="o">=</span><span class="sh">"</span><span class="s">support.context</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">embedding</span><span class="o">=</span><span class="n">embeddings</span><span class="p">,</span>
    <span class="n">index_name</span><span class="o">=</span><span class="sh">"</span><span class="s">support_vector_index</span><span class="sh">"</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">retriever</span> <span class="o">=</span> <span class="nc">MongoDBAtlasHybridSearchRetriever</span><span class="p">(</span>
    <span class="n">vectorstore</span><span class="o">=</span><span class="n">vector_store</span><span class="p">,</span>
    <span class="n">search_index_name</span><span class="o">=</span><span class="sh">"</span><span class="s">support_search_index</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">k</span><span class="o">=</span><span class="mi">8</span><span class="p">,</span>
    <span class="n">fulltext_penalty</span><span class="o">=</span><span class="mf">60.0</span><span class="p">,</span>
    <span class="n">vector_penalty</span><span class="o">=</span><span class="mf">60.0</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">docs</span> <span class="o">=</span> <span class="n">retriever</span><span class="p">.</span><span class="nf">invoke</span><span class="p">(</span>
    <span class="sh">"</span><span class="s">Acme tenant seeing INV-4421 after upgrading to 3.8</span><span class="sh">"</span>
<span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>In a production version, I’d pair that with metadata filters on fields like <code class="language-plaintext highlighter-rouge">tenant_id</code>, <code class="language-plaintext highlighter-rouge">product</code>, <code class="language-plaintext highlighter-rouge">region</code>, and <code class="language-plaintext highlighter-rouge">severity</code>, then pass the top candidates through a Voyage reranker before generation. The point is not that LangChain is magical. The point is that the MongoDB + Voyage retrieval story already fits the way LangChain applications are commonly assembled.</p>

<h2 id="approach-2-llamaindex-when-the-center-of-gravity-is-the-data-itself">Approach 2: LlamaIndex when the center of gravity is the data itself</h2>

<p>If LangChain often feels application-first, LlamaIndex tends to feel data-first. That makes it a very natural fit when you want to spend more time shaping ingestion, chunking, metadata, and query behavior.</p>

<p>Using <a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/ai-integrations/llamaindex/">MongoDB’s LlamaIndex integration</a> we can use <code class="language-plaintext highlighter-rouge">VoyageEmbedding</code> alongside <code class="language-plaintext highlighter-rouge">MongoDBAtlasVectorSearch</code> to make metadata filters very explicit, which is helpful for real enterprise retrieval where “give me the right answer” usually means “give me the right answer for <em>this tenant</em>, <em>this region</em>, and <em>this product line</em>.”</p>

<p>The shape is roughly:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
</pre></td><td class="rouge-code"><pre><span class="kn">import</span> <span class="n">os</span>

<span class="kn">from</span> <span class="n">pymongo</span> <span class="kn">import</span> <span class="n">MongoClient</span>
<span class="kn">from</span> <span class="n">llama_index.embeddings.voyageai</span> <span class="kn">import</span> <span class="n">VoyageEmbedding</span>
<span class="kn">from</span> <span class="n">llama_index.vector_stores.mongodb</span> <span class="kn">import</span> <span class="n">MongoDBAtlasVectorSearch</span>
<span class="kn">from</span> <span class="n">llama_index.core</span> <span class="kn">import</span> <span class="n">StorageContext</span><span class="p">,</span> <span class="n">VectorStoreIndex</span>
<span class="kn">from</span> <span class="n">llama_index.core.retrievers</span> <span class="kn">import</span> <span class="n">VectorIndexRetriever</span>
<span class="kn">from</span> <span class="n">llama_index.core.vector_stores</span> <span class="kn">import</span> <span class="n">MetadataFilters</span><span class="p">,</span> <span class="n">ExactMatchFilter</span>

<span class="n">embed_model</span> <span class="o">=</span> <span class="nc">VoyageEmbedding</span><span class="p">(</span>
    <span class="n">voyage_api_key</span><span class="o">=</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="sh">"</span><span class="s">VOYAGE_API_KEY</span><span class="sh">"</span><span class="p">],</span>
    <span class="n">model_name</span><span class="o">=</span><span class="sh">"</span><span class="s">voyage-4</span><span class="sh">"</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">mongo_client</span> <span class="o">=</span> <span class="nc">MongoClient</span><span class="p">(</span><span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">[</span><span class="sh">"</span><span class="s">MONGODB_URI</span><span class="sh">"</span><span class="p">])</span>
<span class="n">vector_store</span> <span class="o">=</span> <span class="nc">MongoDBAtlasVectorSearch</span><span class="p">(</span>
    <span class="n">mongo_client</span><span class="p">,</span>
    <span class="n">db_name</span><span class="o">=</span><span class="sh">"</span><span class="s">support</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">collection_name</span><span class="o">=</span><span class="sh">"</span><span class="s">context</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">vector_index_name</span><span class="o">=</span><span class="sh">"</span><span class="s">support_vector_index</span><span class="sh">"</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">storage_context</span> <span class="o">=</span> <span class="n">StorageContext</span><span class="p">.</span><span class="nf">from_defaults</span><span class="p">(</span><span class="n">vector_store</span><span class="o">=</span><span class="n">vector_store</span><span class="p">)</span>

<span class="c1"># docs is your loaded support corpus, such as runbooks, incident reviews,
# release notes, and ticket summaries.
</span><span class="n">vector_index</span> <span class="o">=</span> <span class="n">VectorStoreIndex</span><span class="p">.</span><span class="nf">from_documents</span><span class="p">(</span>
    <span class="n">docs</span><span class="p">,</span>
    <span class="n">storage_context</span><span class="o">=</span><span class="n">storage_context</span><span class="p">,</span>
    <span class="n">embed_model</span><span class="o">=</span><span class="n">embed_model</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">filters</span> <span class="o">=</span> <span class="nc">MetadataFilters</span><span class="p">(</span>
    <span class="n">filters</span><span class="o">=</span><span class="p">[</span><span class="nc">ExactMatchFilter</span><span class="p">(</span><span class="n">key</span><span class="o">=</span><span class="sh">"</span><span class="s">tenant_id</span><span class="sh">"</span><span class="p">,</span> <span class="n">value</span><span class="o">=</span><span class="sh">"</span><span class="s">acme</span><span class="sh">"</span><span class="p">)]</span>
<span class="p">)</span>

<span class="n">retriever</span> <span class="o">=</span> <span class="nc">VectorIndexRetriever</span><span class="p">(</span>
    <span class="n">index</span><span class="o">=</span><span class="n">vector_index</span><span class="p">,</span>
    <span class="n">filters</span><span class="o">=</span><span class="n">filters</span><span class="p">,</span>
    <span class="n">similarity_top_k</span><span class="o">=</span><span class="mi">10</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">nodes</span> <span class="o">=</span> <span class="n">retriever</span><span class="p">.</span><span class="nf">retrieve</span><span class="p">(</span><span class="sh">"</span><span class="s">Known workaround for INV-4421?</span><span class="sh">"</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>What I like about this path is that it keeps the retrieval pipeline honest. You can see the data model. You can see the filter model. You can see how chunking choices affect what comes back. For article one in a series like this, that clarity is useful because it keeps us focused on context quality before we get distracted by agent loops.</p>

<h2 id="approach-3-haystack-when-you-want-explicit-composable-pipelines">Approach 3: Haystack when you want explicit, composable pipelines</h2>

<p>Haystack is a nice fit for teams that prefer explicit components over higher-level abstractions. <a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/ai-integrations/haystack/">MongoDB’s Haystack integration</a> uses a <code class="language-plaintext highlighter-rouge">MongoDBAtlasDocumentStore</code> with MongoDB retrievers, and the official tutorial pairs that with Voyage embedders. Haystack’s MongoDB integration also has separate semantic and full-text retrievers, which is useful when you want to make the retrieval strategy itself a first-class part of the pipeline.</p>

<p>A trimmed-down version looks like this:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="rouge-code"><pre><span class="kn">from</span> <span class="n">haystack</span> <span class="kn">import</span> <span class="n">Pipeline</span>
<span class="kn">from</span> <span class="n">haystack.utils</span> <span class="kn">import</span> <span class="n">Secret</span>
<span class="kn">from</span> <span class="n">haystack_integrations.components.embedders.voyage_embedders</span> <span class="kn">import</span> <span class="n">VoyageTextEmbedder</span>
<span class="kn">from</span> <span class="n">haystack_integrations.document_stores.mongodb_atlas</span> <span class="kn">import</span> <span class="n">MongoDBAtlasDocumentStore</span>
<span class="kn">from</span> <span class="n">haystack_integrations.components.retrievers.mongodb_atlas</span> <span class="kn">import</span> <span class="n">MongoDBAtlasEmbeddingRetriever</span>

<span class="n">document_store</span> <span class="o">=</span> <span class="nc">MongoDBAtlasDocumentStore</span><span class="p">(</span>
    <span class="n">mongo_connection_string</span><span class="o">=</span><span class="n">Secret</span><span class="p">.</span><span class="nf">from_env_var</span><span class="p">(</span><span class="sh">"</span><span class="s">MONGODB_URI</span><span class="sh">"</span><span class="p">),</span>
    <span class="n">database_name</span><span class="o">=</span><span class="sh">"</span><span class="s">support</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">collection_name</span><span class="o">=</span><span class="sh">"</span><span class="s">context</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">vector_search_index</span><span class="o">=</span><span class="sh">"</span><span class="s">support_vector_index</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">full_text_search_index</span><span class="o">=</span><span class="sh">"</span><span class="s">support_search_index</span><span class="sh">"</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">pipeline</span> <span class="o">=</span> <span class="nc">Pipeline</span><span class="p">()</span>
<span class="n">pipeline</span><span class="p">.</span><span class="nf">add_component</span><span class="p">(</span><span class="sh">"</span><span class="s">query_embedder</span><span class="sh">"</span><span class="p">,</span> <span class="nc">VoyageTextEmbedder</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="sh">"</span><span class="s">voyage-4</span><span class="sh">"</span><span class="p">))</span>
<span class="n">pipeline</span><span class="p">.</span><span class="nf">add_component</span><span class="p">(</span>
    <span class="sh">"</span><span class="s">retriever</span><span class="sh">"</span><span class="p">,</span>
    <span class="nc">MongoDBAtlasEmbeddingRetriever</span><span class="p">(</span><span class="n">document_store</span><span class="o">=</span><span class="n">document_store</span><span class="p">,</span> <span class="n">top_k</span><span class="o">=</span><span class="mi">10</span><span class="p">),</span>
<span class="p">)</span>
<span class="n">pipeline</span><span class="p">.</span><span class="nf">connect</span><span class="p">(</span><span class="sh">"</span><span class="s">query_embedder.embedding</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">retriever.query_embedding</span><span class="sh">"</span><span class="p">)</span>

<span class="n">result</span> <span class="o">=</span> <span class="n">pipeline</span><span class="p">.</span><span class="nf">run</span><span class="p">(</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">query_embedder</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">text</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">Known workaround for INV-4421?</span><span class="sh">"</span><span class="p">}}</span>
<span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This is probably the most “pipe and fitting” version of the three, and that is a compliment. For enterprise teams, explicit systems are often easier to debug, evaluate, and explain. And once again, the interesting part is not that the framework is different. It is that the same MongoDB + Voyage retrieval core still fits.</p>

<h2 id="why-mongodb">Why MongoDB</h2>

<p>The support copilot does not just need chunks in a vector store. It needs chunks, source documents, tenant metadata, ticket references, account records, release versions, and eventually execution state. MongoDB Vector Search lets you search semantic meaning alongside that operational data, pre-filter the search space using indexed fields, and combine vector and full-text retrieval when exact terms matter. Change streams then give you a way to react to new or updated records in real time, which is exactly what you want when incidents, tickets, or KB articles change during the workday.</p>

<p>And if you want an even tighter platform story, MongoDB Atlas now exposes Voyage models directly through the <a href="https://www.mongodb.com/docs/voyageai/api-reference/overview/">Embedding and Reranking API</a>. That API is database-agnostic, but it pairs especially well with Atlas because it reduces the number of moving pieces needed to stand up a modern retrieval pipeline. Fewer services, fewer credentials, less trying to debug “why is this top result here?”.</p>

<p>This is also where the framework story becomes easier to reason about. LangChain, LlamaIndex, and Haystack all give you different ergonomics. MongoDB stays the system where the data lives. Voyage stays the retrieval layer that improves what gets surfaced. That is a much more durable architecture than betting everything on whichever orchestration framework happens to be loudest this quarter.</p>

<h2 id="what-next">What next?</h2>

<p>Once the retrieval core is solid, adding agents becomes a lot more interesting.</p>

<p>In the next post, I’d take this same support copilot and add <strong>short-term execution state</strong> and <strong>long-term memory</strong>. LangGraph is a natural next step as it <a href="https://docs.langchain.com/oss/python/langgraph/persistence">separates persistence</a> into checkpoints for thread state and stores for long-term memory, and MongoDB already has first-class integrations for both the LangGraph checkpointer and the long-term store. That is where the earlier “brain” idea becomes concrete: not just retrieval, but retrieval plus memory plus durable execution.</p>

<p>The broader trend line is pretty clear, too. Agent frameworks are converging on durable state and memory. Retrieval models are getting richer with <a href="https://docs.voyageai.com/docs/contextualized-chunk-embeddings">contextualized chunk embeddings</a>, multimodal embeddings, and better rerankers. MongoDB Atlas is moving retrieval models and database capabilities closer together. The winning application architecture is the one that can absorb those changes without forcing you to rebuild your data layer every few months. MongoDB and Voyage AI fit that direction unusually well.</p>]]></content><author><name></name></author><category term="AI" /><category term="mongodb" /><category term="ai" /><category term="langchain" /><category term="haystack" /><category term="llamaindex" /><category term="voyage" /><summary type="html"><![CDATA[Before you add planners, crews, or graph-shaped orchestration, build the part that decides what the model should actually see. In this first post, we’ll start an enterprise support copilot and give it the one capability every future agent depends on: retrieval that doesn’t fall apart in production.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.alexbevi.com/images/mongodb-ai-banner.png" /><media:content medium="image" url="https://www.alexbevi.com/images/mongodb-ai-banner.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Persistent multi-agent conversations with the OpenAI Agents SDK and MongoDB</title><link href="https://www.alexbevi.com/blog/2026/04/27/persistent-multi-agent-conversations-with-the-openai-agents-sdk-and-mongodb/" rel="alternate" type="text/html" title="Persistent multi-agent conversations with the OpenAI Agents SDK and MongoDB" /><published>2026-04-27T12:50:00-04:00</published><updated>2026-04-29T09:45:29-04:00</updated><id>https://www.alexbevi.com/blog/2026/04/27/persistent-multi-agent-conversations-with-the-openai-agents-sdk-and-mongodb</id><content type="html" xml:base="https://www.alexbevi.com/blog/2026/04/27/persistent-multi-agent-conversations-with-the-openai-agents-sdk-and-mongodb/"><![CDATA[<p><em>Version 0.14.2 added a <code class="language-plaintext highlighter-rouge">MongoDBSession</code> backend; here’s a working multi-agent customer-support demo that uses it, and the documents it leaves behind.</em></p>

<p>The OpenAI Agents SDK has shipped session backends for SQLite, SQLAlchemy, Redis, and Dapr for a while now. With <strong>0.14.2</strong> (April 2026), <a href="https://openai.github.io/openai-agents-python/sessions/"><code class="language-plaintext highlighter-rouge">MongoDBSession</code> joined that list</a>, and 0.14.6 added the docs page. If you’re already running MongoDB for application data, this is the moment to stop standing up a second store just to remember what the agent said three turns ago. The demo for this walkthrough is a small e-commerce support app with three handoff-connected agents and one MongoDB instance behind everything: customers, orders, support articles, <strong>and</strong> the conversation history.</p>

<p>Repo: <a href="https://github.com/alexbevi/mongodb-openai-agents-sdk-example">https://github.com/alexbevi/mongodb-openai-agents-sdk-example</a>.</p>

<h2 id="what-youll-build">What you’ll build</h2>

<p>A CLI customer-support agent that identifies the user from MongoDB, hands off between a triage agent, an order-support agent, and a knowledge-base agent, and persists every turn (user message, tool call, tool output, assistant reply, handoff) to MongoDB via <code class="language-plaintext highlighter-rouge">MongoDBSession</code>. You quit, restart the process, log in with the same email, and the agent picks up the thread — no re-explaining the return you started yesterday.</p>

<h2 id="why-mongodb-for-sessions">Why MongoDB for sessions</h2>

<p>A session backend has three jobs: store one item per turn, return them in order on the next run, and not corrupt itself when two processes write at once. The interesting part for MongoDB is how naturally each of those maps to things the database already does.</p>

<p><strong>Items in a session are heterogeneous.</strong> A turn can be a user message, a tool call, a tool result, an assistant message, or a handoff record — each with its own shape. A document store takes those payloads as-is. There’s no <code class="language-plaintext highlighter-rouge">messages</code> table you have to migrate every time the SDK adds a new run-item type, and no JSON column to parse around.</p>

<p><strong>Ordering is the hard part, and <code class="language-plaintext highlighter-rouge">$inc</code> is built for it.</strong> <code class="language-plaintext highlighter-rouge">MongoDBSession</code> stamps each message with a monotonically increasing <code class="language-plaintext highlighter-rouge">seq</code> counter — the SDK docs call this out explicitly: it preserves ordering across concurrent writers and processes. That’s a single-document atomic increment, not a distributed lock or an optimistic-retry loop. Two FastAPI workers handling the same <code class="language-plaintext highlighter-rouge">session_id</code> won’t interleave.</p>

<p><strong>One store, one connection pool.</strong> This is the angle the demo actually showcases. The <code class="language-plaintext highlighter-rouge">ecommerce_support</code> database holds <code class="language-plaintext highlighter-rouge">customers</code>, <code class="language-plaintext highlighter-rouge">orders</code>, and <code class="language-plaintext highlighter-rouge">support_articles</code> <em>next to</em> <code class="language-plaintext highlighter-rouge">agent_sessions</code> and <code class="language-plaintext highlighter-rouge">agent_messages</code>. Tools query operational data, the SDK persists turns, and they share the same <code class="language-plaintext highlighter-rouge">AsyncMongoClient</code>. Adding session memory cost zero new infrastructure.</p>

<h2 id="walkthrough">Walkthrough</h2>

<h3 id="1-prerequisites">1. Prerequisites</h3>

<p>Python 3.10+, an OpenAI API key, and either a local <code class="language-plaintext highlighter-rouge">mongod</code> or a <a href="https://www.mongodb.com/cloud/atlas/register">MongoDB Atlas</a> cluster. Nothing in the demo requires Atlas-only features — a 27017 on localhost is fine.</p>

<h3 id="2-install">2. Install</h3>

<p><code class="language-plaintext highlighter-rouge">requirements.txt</code> pins the new extra:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>openai-agents[mongodb]&gt;=0.14.2
python-dotenv&gt;=1.0.0
pymongo&gt;=4.13
</pre></td></tr></tbody></table></code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>pip <span class="nb">install</span> <span class="nt">-r</span> requirements.txt
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">[mongodb]</code> extra pulls in <code class="language-plaintext highlighter-rouge">pymongo</code>’s async client; the <code class="language-plaintext highlighter-rouge">MongoDBSession</code> class lives at <code class="language-plaintext highlighter-rouge">agents.extensions.memory.MongoDBSession</code>.</p>

<h3 id="3-connect">3. Connect</h3>

<p>The demo uses one shared <code class="language-plaintext highlighter-rouge">AsyncMongoClient</code> per process (the right pattern — sessions don’t own the client, they share its pool):</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="rouge-code"><pre><span class="kn">from</span> <span class="n">pymongo.asynchronous.mongo_client</span> <span class="kn">import</span> <span class="n">AsyncMongoClient</span>

<span class="n">MONGODB_URI</span> <span class="o">=</span> <span class="n">os</span><span class="p">.</span><span class="n">environ</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">MONGODB_URI</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">mongodb://localhost:27017</span><span class="sh">"</span><span class="p">)</span>
<span class="n">DB_NAME</span> <span class="o">=</span> <span class="sh">"</span><span class="s">ecommerce_support</span><span class="sh">"</span>

<span class="n">mongo_client</span> <span class="o">=</span> <span class="nc">AsyncMongoClient</span><span class="p">(</span><span class="n">MONGODB_URI</span><span class="p">)</span>
<span class="n">db</span> <span class="o">=</span> <span class="n">mongo_client</span><span class="p">[</span><span class="n">DB_NAME</span><span class="p">]</span>

<span class="k">try</span><span class="p">:</span>
    <span class="k">await</span> <span class="n">mongo_client</span><span class="p">.</span><span class="n">admin</span><span class="p">.</span><span class="nf">command</span><span class="p">(</span><span class="sh">"</span><span class="s">ping</span><span class="sh">"</span><span class="p">)</span>
<span class="k">except</span> <span class="nb">Exception</span> <span class="k">as</span> <span class="n">exc</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="sa">f</span><span class="sh">"</span><span class="se">\n</span><span class="s">Cannot connect to MongoDB (</span><span class="si">{</span><span class="n">MONGODB_URI</span><span class="si">}</span><span class="s">):</span><span class="se">\n</span><span class="s">  </span><span class="si">{</span><span class="n">exc</span><span class="si">}</span><span class="sh">"</span><span class="p">)</span>
    <span class="k">return</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<h3 id="4-seed-and-identify">4. Seed and identify</h3>

<p><code class="language-plaintext highlighter-rouge">python seed_data.py</code> loads three demo customers, five products, five orders with embedded line items, and seven support articles indexed for <code class="language-plaintext highlighter-rouge">$text</code> search. Then <code class="language-plaintext highlighter-rouge">main.py</code> looks the customer up so the triage agent doesn’t have to ask for an email it already knows.</p>

<h3 id="5-instantiate-the-session">5. Instantiate the session</h3>

<p>This is the integration:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre><span class="n">session_id</span> <span class="o">=</span> <span class="sa">f</span><span class="sh">"</span><span class="s">support_</span><span class="si">{</span><span class="n">email</span><span class="p">.</span><span class="nf">replace</span><span class="p">(</span><span class="sh">'</span><span class="s">@</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">_at_</span><span class="sh">'</span><span class="p">).</span><span class="nf">replace</span><span class="p">(</span><span class="sh">'</span><span class="s">.</span><span class="sh">'</span><span class="p">,</span> <span class="sh">'</span><span class="s">_</span><span class="sh">'</span><span class="p">)</span><span class="si">}</span><span class="sh">"</span>
<span class="n">session</span> <span class="o">=</span> <span class="nc">MongoDBSession</span><span class="p">(</span>
    <span class="n">session_id</span><span class="o">=</span><span class="n">session_id</span><span class="p">,</span>
    <span class="n">client</span><span class="o">=</span><span class="n">mongo_client</span><span class="p">,</span>
    <span class="n">database</span><span class="o">=</span><span class="n">DB_NAME</span><span class="p">,</span>
<span class="p">)</span>

<span class="k">if</span> <span class="ow">not</span> <span class="k">await</span> <span class="n">session</span><span class="p">.</span><span class="nf">ping</span><span class="p">():</span>
    <span class="nf">print</span><span class="p">(</span><span class="sh">"</span><span class="s">Warning: MongoDB session storage is unavailable.</span><span class="sh">"</span><span class="p">)</span>

<span class="n">existing</span> <span class="o">=</span> <span class="k">await</span> <span class="n">session</span><span class="p">.</span><span class="nf">get_items</span><span class="p">()</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Constructing with <code class="language-plaintext highlighter-rouge">client=</code> (rather than <code class="language-plaintext highlighter-rouge">MongoDBSession.from_uri(...)</code>) means the session shares the app’s connection pool and <code class="language-plaintext highlighter-rouge">session.close()</code> becomes a no-op — the lifecycle stays with you. <code class="language-plaintext highlighter-rouge">session.ping()</code> is a real round-trip against MongoDB, useful for liveness probes.</p>

<h3 id="6-run">6. Run</h3>

<p>Pass <code class="language-plaintext highlighter-rouge">session=</code> to the runner. Everything else is the same SDK you already know:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre><span class="k">with</span> <span class="nf">trace</span><span class="p">(</span><span class="sh">"</span><span class="s">Customer Support</span><span class="sh">"</span><span class="p">,</span> <span class="n">group_id</span><span class="o">=</span><span class="n">conversation_id</span><span class="p">):</span>
    <span class="n">result</span> <span class="o">=</span> <span class="k">await</span> <span class="n">Runner</span><span class="p">.</span><span class="nf">run</span><span class="p">(</span>
        <span class="n">current_agent</span><span class="p">,</span>
        <span class="nb">input</span><span class="o">=</span><span class="n">user_input</span><span class="p">,</span>
        <span class="n">context</span><span class="o">=</span><span class="n">ctx</span><span class="p">,</span>
        <span class="n">session</span><span class="o">=</span><span class="n">session</span><span class="p">,</span>   <span class="c1"># MongoDB stores every turn automatically
</span>    <span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Have a conversation, <code class="language-plaintext highlighter-rouge">quit</code>, run <code class="language-plaintext highlighter-rouge">python main.py</code> again with the same email, and the next message gets the full prior context prepended automatically.</p>

<h2 id="what-mongodb-actually-stored">What MongoDB actually stored</h2>

<p>After a few turns with <code class="language-plaintext highlighter-rouge">alice@example.com</code>, two collections show up in the <code class="language-plaintext highlighter-rouge">ecommerce_support</code> database. The interesting one is <code class="language-plaintext highlighter-rouge">agent_messages</code>. A representative document, abridged:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre><span class="p">{</span>
  <span class="nl">_id</span><span class="p">:</span> <span class="nc">ObjectId</span><span class="p">(</span><span class="dl">"</span><span class="s2">6620d1f4...</span><span class="dl">"</span><span class="p">),</span>
  <span class="nx">session_id</span><span class="p">:</span> <span class="dl">"</span><span class="s2">support_alice_at_example_com</span><span class="dl">"</span><span class="p">,</span>   <span class="c1">// partition key for this conversation</span>
  <span class="nx">seq</span><span class="p">:</span> <span class="mi">7</span><span class="p">,</span>                                        <span class="c1">// monotonically increasing turn order</span>
  <span class="nx">message_data</span><span class="p">:</span> <span class="p">{</span>                                <span class="c1">// the SDK's run-item, stored as-is</span>
    <span class="nl">type</span><span class="p">:</span> <span class="dl">"</span><span class="s2">function_call_output</span><span class="dl">"</span><span class="p">,</span>
    <span class="nx">call_id</span><span class="p">:</span> <span class="dl">"</span><span class="s2">call_8b2...</span><span class="dl">"</span><span class="p">,</span>
    <span class="nx">output</span><span class="p">:</span> <span class="dl">"</span><span class="s2">Return initiated for order ORD-1001.</span><span class="se">\n</span><span class="s2">Reason: Not powerful enough...</span><span class="se">\n</span><span class="s2">Estimated refund: $1,484.98 (includes 10% Gold loyalty bonus)</span><span class="dl">"</span>
  <span class="p">},</span>
  <span class="nx">created_at</span><span class="p">:</span> <span class="nc">ISODate</span><span class="p">(</span><span class="dl">"</span><span class="s2">2026-04-26T19:14:08.221Z</span><span class="dl">"</span><span class="p">)</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Three fields earn their keep:</p>

<ul>
  <li><strong><code class="language-plaintext highlighter-rouge">session_id</code></strong> is the only field every read filters on. It’s the partition key for “this conversation.”</li>
  <li><strong><code class="language-plaintext highlighter-rouge">seq</code></strong> is the integer that makes ordering deterministic. The SDK reads with <code class="language-plaintext highlighter-rouge">sort({ seq: 1 })</code> and writes with an atomic <code class="language-plaintext highlighter-rouge">$inc</code> against the matching <code class="language-plaintext highlighter-rouge">agent_sessions</code> document, which is what makes concurrent workers safe without a distributed lock.</li>
  <li><strong><code class="language-plaintext highlighter-rouge">message_data</code></strong> is the SDK’s run-item — a user message, tool call, tool output, assistant message, or handoff. Different shape every time. The document model just stores it.</li>
</ul>

<p><code class="language-plaintext highlighter-rouge">agent_sessions</code> holds one document per <code class="language-plaintext highlighter-rouge">session_id</code> with the current high-water <code class="language-plaintext highlighter-rouge">seq</code> and timestamps — that’s the counter <code class="language-plaintext highlighter-rouge">$inc</code> operates on.</p>

<p>The SDK creates its indexes on first use (per the <a href="https://openai.github.io/openai-agents-python/sessions/">sessions docs</a>). You’ll see a compound index on <code class="language-plaintext highlighter-rouge">(session_id, seq)</code> on <code class="language-plaintext highlighter-rouge">agent_messages</code> (the only access pattern the SDK has — fetch ordered history for one session) and a unique index on <code class="language-plaintext highlighter-rouge">session_id</code> in <code class="language-plaintext highlighter-rouge">agent_sessions</code>.</p>

<h2 id="production-notes">Production notes</h2>

<p>For Atlas, swap the URI for <code class="language-plaintext highlighter-rouge">mongodb+srv://...</code> — <code class="language-plaintext highlighter-rouge">MongoDBSession</code> accepts it without any other change. If abandoned conversations accumulate, add a TTL index on <code class="language-plaintext highlighter-rouge">agent_messages.created_at</code> and old turns retire on their own.</p>

<p>Connection lifetime matters: keep one <code class="language-plaintext highlighter-rouge">AsyncMongoClient</code> per process, construct <code class="language-plaintext highlighter-rouge">MongoDBSession(client=...)</code> per request, and let the Runner do the rest. Don’t reach for <code class="language-plaintext highlighter-rouge">MongoDBSession.from_uri(...)</code> in a web handler — it builds and tears down a client every call. The session needs read/write on the two configured collections (defaults <code class="language-plaintext highlighter-rouge">agent_sessions</code> and <code class="language-plaintext highlighter-rouge">agent_messages</code>, both overridable via <code class="language-plaintext highlighter-rouge">sessions_collection=</code> and <code class="language-plaintext highlighter-rouge">messages_collection=</code>). The <code class="language-plaintext highlighter-rouge">seq</code> counter keeps concurrent writers safe, but fanning the same <code class="language-plaintext highlighter-rouge">session_id</code> across processes will interleave their turns — safe, but probably not what the user meant.</p>

<h2 id="try-it-yourself">Try it yourself</h2>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="rouge-code"><pre>git clone https://github.com/alexbevi/mongodb-openai-agents-sdk-example
<span class="nb">cd </span>mongodb-openai-agents-sdk-example
pip <span class="nb">install</span> <span class="nt">-r</span> requirements.txt
<span class="nb">cp </span>env.example .env          <span class="c"># set OPENAI_API_KEY and MONGODB_URI</span>
python seed_data.py
python main.py
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Required env vars: <code class="language-plaintext highlighter-rouge">OPENAI_API_KEY</code>, <code class="language-plaintext highlighter-rouge">MONGODB_URI</code> (defaults to <code class="language-plaintext highlighter-rouge">mongodb://localhost:27017</code>). Demo accounts: <code class="language-plaintext highlighter-rouge">alice@example.com</code> (Gold), <code class="language-plaintext highlighter-rouge">bob@example.com</code> (Standard), <code class="language-plaintext highlighter-rouge">carol@example.com</code> (Platinum).</p>

<h2 id="where-to-go-next">Where to go next</h2>

<ul>
  <li>The full session API surface — <code class="language-plaintext highlighter-rouge">get_items</code>, <code class="language-plaintext highlighter-rouge">add_items</code>, <code class="language-plaintext highlighter-rouge">pop_item</code>, <code class="language-plaintext highlighter-rouge">clear_session</code>, <code class="language-plaintext highlighter-rouge">ping</code> — is documented in the <a href="https://openai.github.io/openai-agents-python/sessions/">Sessions overview</a>, including the MongoDB-specific notes on collection naming and Atlas URIs.</li>
  <li>Wrap your <code class="language-plaintext highlighter-rouge">MongoDBSession</code> in <a href="https://openai.github.io/openai-agents-python/sessions/"><code class="language-plaintext highlighter-rouge">OpenAIResponsesCompactionSession</code></a> once threads grow long; it summarizes old turns server-side and rewrites the underlying session.</li>
  <li>The natural next MongoDB feature for this demo is <a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/">Atlas Vector Search</a> — store embeddings on <code class="language-plaintext highlighter-rouge">support_articles</code> and replace the <code class="language-plaintext highlighter-rouge">$text</code> query in <code class="language-plaintext highlighter-rouge">search_knowledge_base</code> with <code class="language-plaintext highlighter-rouge">$vectorSearch</code>. Same database, same client, one new index.</li>
</ul>]]></content><author><name></name></author><category term="AI" /><category term="mongodb" /><category term="ai" /><category term="python" /><summary type="html"><![CDATA[Version 0.14.2 added a MongoDBSession backend; here’s a working multi-agent customer-support demo that uses it, and the documents it leaves behind.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.alexbevi.com/images/mongodb-ai.png" /><media:content medium="image" url="https://www.alexbevi.com/images/mongodb-ai.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">MongoDB as the Brain of Modern AI Applications</title><link href="https://www.alexbevi.com/blog/2026/04/15/mongodb-as-the-brain-of-modern-ai-applications/" rel="alternate" type="text/html" title="MongoDB as the Brain of Modern AI Applications" /><published>2026-04-15T16:42:15-04:00</published><updated>2026-04-29T09:45:29-04:00</updated><id>https://www.alexbevi.com/blog/2026/04/15/mongodb-as-the-brain-of-modern-ai-applications</id><content type="html" xml:base="https://www.alexbevi.com/blog/2026/04/15/mongodb-as-the-brain-of-modern-ai-applications/"><![CDATA[<p>Production agents need two persistence layers: thread-scoped state and cross-session memory. <a href="https://adk.dev/sessions/session/">Google ADK Sessions</a> stores <code class="language-plaintext highlighter-rouge">events</code> and <code class="language-plaintext highlighter-rouge">state</code> for a single conversation, while <a href="https://adk.dev/sessions/memory/"><code class="language-plaintext highlighter-rouge">MemoryService</code></a> handles recall across sessions. <a href="https://docs.langchain.com/oss/python/langgraph/add-memory">LangGraph memory</a> makes the same split with a checkpointer for short-term memory and a store for long-term memory, and <a href="https://docs.langchain.com/oss/python/langchain/long-term-memory">LangChain long-term memory</a> builds on LangGraph stores that persist JSON documents by namespace and key. The memory architecture has already converged.</p>

<p>Durable memory is not raw chat replay. <a href="https://docs.cloud.google.com/agent-builder/agent-engine/memory-bank/overview">Vertex AI Memory Bank</a> is built for identity-scoped, cross-session personalization and LLM-driven knowledge extraction, and <a href="https://cloud.google.com/blog/topics/developers-practitioners/remember-this-agent-state-and-memory-with-adk">Google’s ADK memory write-up</a> describes Memory Bank as extracting key information from session data rather than replaying every turn. <a href="https://docs.langchain.com/oss/python/concepts/memory">LangChain’s memory model</a> is equally explicit: long-term memory can be semantic (facts), episodic (past actions), or procedural (rules and prompts).</p>

<p>Agent memory should be structured data, not opaque blobs. <a href="https://docs.langchain.com/oss/python/langchain/long-term-memory">LangChain stores</a> persist long-term memory as JSON documents, and ADK’s <a href="https://adk.dev/sessions/session/migrate/"><code class="language-plaintext highlighter-rouge">DatabaseSessionService</code> migration</a> moved session serialization from pickle-based storage to JSON-based storage in v1.22.0. MongoDB’s document model matches that reality directly.</p>

<p>MongoDB is a strong fit because retrieval lives in the same system as the memory. <a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/">MongoDB Vector Search</a> supports both approximate and exact nearest-neighbor search, and the default index type is <a href="https://www.mongodb.com/docs/atlas/atlas-search/field-types/vector-type/">HNSW</a>. <a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/">Vector search pre-filters</a> let you scope recall by fields like <code class="language-plaintext highlighter-rouge">user_id</code>, <code class="language-plaintext highlighter-rouge">tenant_id</code>, or <code class="language-plaintext highlighter-rouge">memory_type</code> before embeddings are compared. <a href="https://www.mongodb.com/docs/atlas/ai-integrations/langchain/hybrid-search/">Hybrid search</a> combines vector and full-text retrieval with reciprocal rank fusion, which is exactly what memory needs when the data mixes natural language with exact identifiers like invoice IDs, feature flags, or product SKUs.</p>

<p>Memory also needs retention and update policy. <a href="https://www.mongodb.com/docs/manual/core/index-ttl/">TTL indexes</a> automatically expire session artifacts, scratchpads, or short-lived summaries. <a href="https://www.mongodb.com/docs/manual/changestreams/">Change streams</a> give you a real-time feed of inserts and updates, which is the right trigger for summarization, entity extraction, or memory distillation jobs. When the data is relationship-heavy instead of chunk-heavy, <a href="https://www.mongodb.com/docs/atlas/ai-integrations/langchain/graph-rag/">GraphRAG on MongoDB</a> uses entities, edges, and <code class="language-plaintext highlighter-rouge">$graphLookup</code> for relationship-aware, multi-hop retrieval.</p>

<p>This is not a narrow LangChain story. MongoDB publishes integrations for <a href="https://www.mongodb.com/docs/atlas/ai-integrations/langgraph/">LangGraph</a>, <a href="https://www.mongodb.com/docs/atlas/ai-integrations/langchain/">LangChain</a>, <a href="https://developers.llamaindex.ai/python/framework/integrations/vector_stores/mongodbatlasvectorsearch/">LlamaIndex</a>, <a href="https://www.mongodb.com/docs/atlas/ai-integrations/">Semantic Kernel</a>, <a href="https://www.mongodb.com/docs/atlas/ai-integrations/">Haystack</a>, <a href="https://www.mongodb.com/docs/atlas/ai-integrations/">Spring AI</a>, <a href="https://www.mongodb.com/docs/atlas/ai-integrations/">CrewAI</a>, and <a href="https://www.mongodb.com/docs/atlas/ai-integrations/">Vertex AI</a>. <a href="https://developers.llamaindex.ai/python/framework/module_guides/storing/docstores/">LlamaIndex</a> can use MongoDB for the vector store, document store, and index store. <a href="https://docs.mem0.ai/components/vectordbs/dbs/mongodb">Mem0</a> also supports MongoDB as a memory backend. MongoDB fits the storage contract these frameworks keep converging on: structured documents plus semantic, lexical, and graph-based retrieval.</p>

<h2 id="langgraph-short-term-checkpoints-and-long-term-memory-in-one-database">LangGraph: short-term checkpoints and long-term memory in one database</h2>

<p><a href="https://www.mongodb.com/docs/atlas/ai-integrations/langgraph/">MongoDB’s LangGraph integration</a> exposes <code class="language-plaintext highlighter-rouge">MongoDBSaver</code> for checkpoints and <code class="language-plaintext highlighter-rouge">MongoDBStore</code> for durable memory, with optional vector indexing and TTL-based expiry. That maps directly to LangGraph’s own split between thread persistence and store-backed recall.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
</pre></td><td class="rouge-code"><pre><span class="kn">from</span> <span class="n">pymongo</span> <span class="kn">import</span> <span class="n">MongoClient</span>
<span class="kn">from</span> <span class="n">langgraph.checkpoint.mongodb</span> <span class="kn">import</span> <span class="n">MongoDBSaver</span>
<span class="kn">from</span> <span class="n">langgraph.store.mongodb</span> <span class="kn">import</span> <span class="n">MongoDBStore</span><span class="p">,</span> <span class="n">create_vector_index_config</span>
<span class="kn">from</span> <span class="n">langchain_openai</span> <span class="kn">import</span> <span class="n">OpenAIEmbeddings</span>

<span class="n">MONGODB_URI</span> <span class="o">=</span> <span class="sh">"</span><span class="s">&lt;connection-string&gt;</span><span class="sh">"</span>

<span class="c1"># Short-term memory: thread checkpoints
</span><span class="n">client</span> <span class="o">=</span> <span class="nc">MongoClient</span><span class="p">(</span><span class="n">MONGODB_URI</span><span class="p">)</span>
<span class="n">checkpointer</span> <span class="o">=</span> <span class="nc">MongoDBSaver</span><span class="p">(</span><span class="n">client</span><span class="p">)</span>

<span class="c1"># Long-term memory: semantic store with metadata filters
</span><span class="n">index_config</span> <span class="o">=</span> <span class="nf">create_vector_index_config</span><span class="p">(</span>
    <span class="n">embed</span><span class="o">=</span><span class="nc">OpenAIEmbeddings</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="sh">"</span><span class="s">text-embedding-3-small</span><span class="sh">"</span><span class="p">),</span>
    <span class="n">dims</span><span class="o">=</span><span class="mi">1536</span><span class="p">,</span>
    <span class="n">fields</span><span class="o">=</span><span class="p">[</span><span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">],</span>
    <span class="n">filters</span><span class="o">=</span><span class="p">[</span><span class="sh">"</span><span class="s">user_id</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">memory_type</span><span class="sh">"</span><span class="p">],</span>
<span class="p">)</span>

<span class="c1"># Assume `builder` is an existing LangGraph StateGraph
</span><span class="k">with</span> <span class="n">MongoDBStore</span><span class="p">.</span><span class="nf">from_conn_string</span><span class="p">(</span>
    <span class="n">conn_string</span><span class="o">=</span><span class="n">MONGODB_URI</span><span class="p">,</span>
    <span class="n">db_name</span><span class="o">=</span><span class="sh">"</span><span class="s">agent_memory</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">collection_name</span><span class="o">=</span><span class="sh">"</span><span class="s">memories</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">index_config</span><span class="o">=</span><span class="n">index_config</span><span class="p">,</span>
    <span class="n">ttl_config</span><span class="o">=</span><span class="p">{</span>
        <span class="sh">"</span><span class="s">default_ttl</span><span class="sh">"</span><span class="p">:</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">60</span> <span class="o">*</span> <span class="mi">24</span> <span class="o">*</span> <span class="mi">30</span><span class="p">,</span>   <span class="c1"># 30 days
</span>        <span class="sh">"</span><span class="s">refresh_on_read</span><span class="sh">"</span><span class="p">:</span> <span class="bp">True</span><span class="p">,</span>
    <span class="p">},</span>
<span class="p">)</span> <span class="k">as</span> <span class="n">store</span><span class="p">:</span>
    <span class="n">graph</span> <span class="o">=</span> <span class="n">builder</span><span class="p">.</span><span class="nf">compile</span><span class="p">(</span><span class="n">checkpointer</span><span class="o">=</span><span class="n">checkpointer</span><span class="p">,</span> <span class="n">store</span><span class="o">=</span><span class="n">store</span><span class="p">)</span>

    <span class="n">store</span><span class="p">.</span><span class="nf">put</span><span class="p">(</span>
        <span class="n">namespace</span><span class="o">=</span><span class="p">(</span><span class="sh">"</span><span class="s">user-42</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">memories</span><span class="sh">"</span><span class="p">),</span>
        <span class="n">key</span><span class="o">=</span><span class="sh">"</span><span class="s">pref:vegan:soho</span><span class="sh">"</span><span class="p">,</span>
        <span class="n">value</span><span class="o">=</span><span class="p">{</span>
            <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">User prefers vegan restaurants near SoHo.</span><span class="sh">"</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">user_id</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">user-42</span><span class="sh">"</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">memory_type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">semantic</span><span class="sh">"</span><span class="p">,</span>
        <span class="p">},</span>
    <span class="p">)</span>

    <span class="n">results</span> <span class="o">=</span> <span class="n">store</span><span class="p">.</span><span class="nf">search</span><span class="p">(</span>
        <span class="p">(</span><span class="sh">"</span><span class="s">user-42</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">memories</span><span class="sh">"</span><span class="p">),</span>
        <span class="n">query</span><span class="o">=</span><span class="sh">"</span><span class="s">Where should I book dinner tonight?</span><span class="sh">"</span><span class="p">,</span>
        <span class="n">limit</span><span class="o">=</span><span class="mi">3</span><span class="p">,</span>
    <span class="p">)</span>

    <span class="k">for</span> <span class="n">result</span> <span class="ow">in</span> <span class="n">results</span><span class="p">:</span>
        <span class="nf">print</span><span class="p">(</span><span class="n">result</span><span class="p">.</span><span class="n">value</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This is the clean MongoDB story: checkpoints for working memory, a store for long-term memory, vector retrieval for recall, metadata filters for isolation, and TTL for automatic cleanup. The same database handles all of it.</p>

<h2 id="langchain-chat-history-plus-hybrid-recall">LangChain: chat history plus hybrid recall</h2>

<p>At the LangChain layer, MongoDB covers both conversation state and retrieval. <a href="https://www.mongodb.com/docs/atlas/ai-integrations/langchain/">MongoDBChatMessageHistory</a> persists per-session message history, <a href="https://www.mongodb.com/docs/atlas/ai-integrations/langchain/"><code class="language-plaintext highlighter-rouge">MongoDBAtlasVectorSearch</code></a> stores semantic memories, and <a href="https://www.mongodb.com/docs/atlas/ai-integrations/langchain/hybrid-search/"><code class="language-plaintext highlighter-rouge">MongoDBAtlasHybridSearchRetriever</code></a> fuses lexical and semantic recall.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
</pre></td><td class="rouge-code"><pre><span class="kn">from</span> <span class="n">langchain_core.documents</span> <span class="kn">import</span> <span class="n">Document</span>
<span class="kn">from</span> <span class="n">langchain_mongodb.chat_message_histories</span> <span class="kn">import</span> <span class="n">MongoDBChatMessageHistory</span>
<span class="kn">from</span> <span class="n">langchain_mongodb.retrievers.hybrid_search</span> <span class="kn">import</span> <span class="n">MongoDBAtlasHybridSearchRetriever</span>
<span class="kn">from</span> <span class="n">langchain_mongodb.vectorstores</span> <span class="kn">import</span> <span class="n">MongoDBAtlasVectorSearch</span>
<span class="kn">from</span> <span class="n">langchain_openai</span> <span class="kn">import</span> <span class="n">OpenAIEmbeddings</span>

<span class="n">MONGODB_URI</span> <span class="o">=</span> <span class="sh">"</span><span class="s">&lt;connection-string&gt;</span><span class="sh">"</span>

<span class="n">history</span> <span class="o">=</span> <span class="nc">MongoDBChatMessageHistory</span><span class="p">(</span>
    <span class="n">session_id</span><span class="o">=</span><span class="sh">"</span><span class="s">user-42:thread-7</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">connection_string</span><span class="o">=</span><span class="n">MONGODB_URI</span><span class="p">,</span>
    <span class="n">database_name</span><span class="o">=</span><span class="sh">"</span><span class="s">agent_memory</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">collection_name</span><span class="o">=</span><span class="sh">"</span><span class="s">chat_history</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">history_size</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">vector_store</span> <span class="o">=</span> <span class="n">MongoDBAtlasVectorSearch</span><span class="p">.</span><span class="nf">from_connection_string</span><span class="p">(</span>
    <span class="n">connection_string</span><span class="o">=</span><span class="n">MONGODB_URI</span><span class="p">,</span>
    <span class="n">namespace</span><span class="o">=</span><span class="sh">"</span><span class="s">agent_memory.user_memories</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">embedding</span><span class="o">=</span><span class="nc">OpenAIEmbeddings</span><span class="p">(</span><span class="n">model</span><span class="o">=</span><span class="sh">"</span><span class="s">text-embedding-3-small</span><span class="sh">"</span><span class="p">),</span>
    <span class="n">index_name</span><span class="o">=</span><span class="sh">"</span><span class="s">vector_index</span><span class="sh">"</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">vector_store</span><span class="p">.</span><span class="nf">add_documents</span><span class="p">(</span>
    <span class="p">[</span>
        <span class="nc">Document</span><span class="p">(</span>
            <span class="n">page_content</span><span class="o">=</span><span class="sh">"</span><span class="s">User prefers vegan restaurants near SoHo.</span><span class="sh">"</span><span class="p">,</span>
            <span class="n">metadata</span><span class="o">=</span><span class="p">{</span><span class="sh">"</span><span class="s">user_id</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">user-42</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">memory_type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">semantic</span><span class="sh">"</span><span class="p">},</span>
        <span class="p">),</span>
        <span class="nc">Document</span><span class="p">(</span>
            <span class="n">page_content</span><span class="o">=</span><span class="sh">"</span><span class="s">Invoice 8419 was disputed last month.</span><span class="sh">"</span><span class="p">,</span>
            <span class="n">metadata</span><span class="o">=</span><span class="p">{</span><span class="sh">"</span><span class="s">user_id</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">user-42</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">memory_type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">episodic</span><span class="sh">"</span><span class="p">},</span>
        <span class="p">),</span>
    <span class="p">]</span>
<span class="p">)</span>

<span class="n">history</span><span class="p">.</span><span class="nf">add_user_message</span><span class="p">(</span><span class="sh">"</span><span class="s">Find dinner options for me in SoHo.</span><span class="sh">"</span><span class="p">)</span>

<span class="n">retriever</span> <span class="o">=</span> <span class="nc">MongoDBAtlasHybridSearchRetriever</span><span class="p">(</span>
    <span class="n">vectorstore</span><span class="o">=</span><span class="n">vector_store</span><span class="p">,</span>
    <span class="n">search_index_name</span><span class="o">=</span><span class="sh">"</span><span class="s">search_index</span><span class="sh">"</span><span class="p">,</span>
    <span class="n">top_k</span><span class="o">=</span><span class="mi">5</span><span class="p">,</span>
    <span class="n">fulltext_penalty</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span>
    <span class="n">vector_penalty</span><span class="o">=</span><span class="mi">50</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">docs</span> <span class="o">=</span> <span class="n">retriever</span><span class="p">.</span><span class="nf">invoke</span><span class="p">(</span><span class="sh">"</span><span class="s">Find dinner options for a vegan user in SoHo</span><span class="sh">"</span><span class="p">)</span>
<span class="k">for</span> <span class="n">doc</span> <span class="ow">in</span> <span class="n">docs</span><span class="p">:</span>
    <span class="nf">print</span><span class="p">(</span><span class="n">doc</span><span class="p">.</span><span class="n">page_content</span><span class="p">,</span> <span class="n">doc</span><span class="p">.</span><span class="n">metadata</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Hybrid retrieval is not optional in a serious memory system. “Prefers vegan restaurants in SoHo” is semantic. “Invoice 8419” is lexical. MongoDB’s hybrid retriever exists because production memory contains both.</p>

<h2 id="google-adk-the-sessions-and-memory-model-maps-cleanly-to-mongodb">Google ADK: the Sessions-and-Memory model maps cleanly to MongoDB</h2>

<p><a href="https://adk.dev/sessions/session/">Google ADK</a> makes the architecture explicit. <a href="https://adk.dev/runtime/event-loop/"><code class="language-plaintext highlighter-rouge">SessionService</code></a> manages session objects, applies <code class="language-plaintext highlighter-rouge">state_delta</code>, and appends event history. <a href="https://adk.dev/runtime/event-loop/"><code class="language-plaintext highlighter-rouge">MemoryService</code></a> manages long-term semantic memory across sessions. The <a href="https://adk.dev/sessions/session/">Sessions docs</a> currently list <code class="language-plaintext highlighter-rouge">InMemorySessionService</code>, <code class="language-plaintext highlighter-rouge">VertexAiSessionService</code>, and <code class="language-plaintext highlighter-rouge">DatabaseSessionService</code>, so MongoDB is not a built-in backend today. But ADK exposes <a href="https://adk.dev/api-reference/python/">base session and memory service abstractions</a>, which makes MongoDB a natural implementation target rather than a workaround. <a href="https://docs.cloud.google.com/agent-builder/agent-engine/memory-bank/overview">Memory Bank</a> then adds the identity-scoped memory semantics on top.</p>

<p>A MongoDB-backed ADK deployment should separate sessions, events, and distilled memories into dedicated collections. That mirrors ADK’s documented split between mutable session state, append-only event history, and searchable long-term memory.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
</pre></td><td class="rouge-code"><pre><span class="kn">from</span> <span class="n">datetime</span> <span class="kn">import</span> <span class="n">datetime</span><span class="p">,</span> <span class="n">timezone</span>
<span class="kn">from</span> <span class="n">pymongo</span> <span class="kn">import</span> <span class="n">ASCENDING</span><span class="p">,</span> <span class="n">MongoClient</span>

<span class="n">MONGODB_URI</span> <span class="o">=</span> <span class="sh">"</span><span class="s">&lt;connection-string&gt;</span><span class="sh">"</span>

<span class="n">client</span> <span class="o">=</span> <span class="nc">MongoClient</span><span class="p">(</span><span class="n">MONGODB_URI</span><span class="p">)</span>
<span class="n">db</span> <span class="o">=</span> <span class="n">client</span><span class="p">[</span><span class="sh">"</span><span class="s">agent_memory</span><span class="sh">"</span><span class="p">]</span>

<span class="n">sessions</span> <span class="o">=</span> <span class="n">db</span><span class="p">[</span><span class="sh">"</span><span class="s">adk_sessions</span><span class="sh">"</span><span class="p">]</span>
<span class="n">events</span> <span class="o">=</span> <span class="n">db</span><span class="p">[</span><span class="sh">"</span><span class="s">adk_events</span><span class="sh">"</span><span class="p">]</span>
<span class="n">memories</span> <span class="o">=</span> <span class="n">db</span><span class="p">[</span><span class="sh">"</span><span class="s">adk_memories</span><span class="sh">"</span><span class="p">]</span>

<span class="n">sessions</span><span class="p">.</span><span class="nf">create_index</span><span class="p">(</span>
    <span class="p">[(</span><span class="sh">"</span><span class="s">app_name</span><span class="sh">"</span><span class="p">,</span> <span class="n">ASCENDING</span><span class="p">),</span> <span class="p">(</span><span class="sh">"</span><span class="s">user_id</span><span class="sh">"</span><span class="p">,</span> <span class="n">ASCENDING</span><span class="p">),</span> <span class="p">(</span><span class="sh">"</span><span class="s">session_id</span><span class="sh">"</span><span class="p">,</span> <span class="n">ASCENDING</span><span class="p">)],</span>
    <span class="n">unique</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
<span class="p">)</span>

<span class="n">events</span><span class="p">.</span><span class="nf">create_index</span><span class="p">(</span>
    <span class="p">[</span>
        <span class="p">(</span><span class="sh">"</span><span class="s">app_name</span><span class="sh">"</span><span class="p">,</span> <span class="n">ASCENDING</span><span class="p">),</span>
        <span class="p">(</span><span class="sh">"</span><span class="s">user_id</span><span class="sh">"</span><span class="p">,</span> <span class="n">ASCENDING</span><span class="p">),</span>
        <span class="p">(</span><span class="sh">"</span><span class="s">session_id</span><span class="sh">"</span><span class="p">,</span> <span class="n">ASCENDING</span><span class="p">),</span>
        <span class="p">(</span><span class="sh">"</span><span class="s">timestamp</span><span class="sh">"</span><span class="p">,</span> <span class="n">ASCENDING</span><span class="p">),</span>
    <span class="p">]</span>
<span class="p">)</span>

<span class="n">memories</span><span class="p">.</span><span class="nf">create_index</span><span class="p">([(</span><span class="sh">"</span><span class="s">user_id</span><span class="sh">"</span><span class="p">,</span> <span class="n">ASCENDING</span><span class="p">),</span> <span class="p">(</span><span class="sh">"</span><span class="s">memory_type</span><span class="sh">"</span><span class="p">,</span> <span class="n">ASCENDING</span><span class="p">)])</span>
<span class="c1"># Create a MongoDB Vector Search index on memories.embedding
# Mark user_id and memory_type as filter fields in the index definition.
</span>
<span class="k">def</span> <span class="nf">persist_session_turn</span><span class="p">(</span><span class="n">app_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">user_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">session_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">event</span><span class="p">:</span> <span class="nb">dict</span><span class="p">,</span> <span class="n">state</span><span class="p">:</span> <span class="nb">dict</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="n">now</span> <span class="o">=</span> <span class="n">datetime</span><span class="p">.</span><span class="nf">now</span><span class="p">(</span><span class="n">timezone</span><span class="p">.</span><span class="n">utc</span><span class="p">)</span>

    <span class="n">sessions</span><span class="p">.</span><span class="nf">update_one</span><span class="p">(</span>
        <span class="p">{</span><span class="sh">"</span><span class="s">app_name</span><span class="sh">"</span><span class="p">:</span> <span class="n">app_name</span><span class="p">,</span> <span class="sh">"</span><span class="s">user_id</span><span class="sh">"</span><span class="p">:</span> <span class="n">user_id</span><span class="p">,</span> <span class="sh">"</span><span class="s">session_id</span><span class="sh">"</span><span class="p">:</span> <span class="n">session_id</span><span class="p">},</span>
        <span class="p">{</span>
            <span class="sh">"</span><span class="s">$set</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">state</span><span class="sh">"</span><span class="p">:</span> <span class="n">state</span><span class="p">,</span> <span class="sh">"</span><span class="s">updated_at</span><span class="sh">"</span><span class="p">:</span> <span class="n">now</span><span class="p">},</span>
            <span class="sh">"</span><span class="s">$setOnInsert</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">created_at</span><span class="sh">"</span><span class="p">:</span> <span class="n">now</span><span class="p">},</span>
        <span class="p">},</span>
        <span class="n">upsert</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span>
    <span class="p">)</span>

    <span class="n">events</span><span class="p">.</span><span class="nf">insert_one</span><span class="p">(</span>
        <span class="p">{</span>
            <span class="sh">"</span><span class="s">app_name</span><span class="sh">"</span><span class="p">:</span> <span class="n">app_name</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">user_id</span><span class="sh">"</span><span class="p">:</span> <span class="n">user_id</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">session_id</span><span class="sh">"</span><span class="p">:</span> <span class="n">session_id</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">timestamp</span><span class="sh">"</span><span class="p">:</span> <span class="n">event</span><span class="p">[</span><span class="sh">"</span><span class="s">timestamp</span><span class="sh">"</span><span class="p">],</span>
            <span class="sh">"</span><span class="s">event</span><span class="sh">"</span><span class="p">:</span> <span class="n">event</span><span class="p">,</span>
        <span class="p">}</span>
    <span class="p">)</span>

<span class="k">def</span> <span class="nf">store_memory</span><span class="p">(</span><span class="n">user_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">content</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">embedding</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">],</span> <span class="n">source_session_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="bp">None</span><span class="p">:</span>
    <span class="n">memories</span><span class="p">.</span><span class="nf">insert_one</span><span class="p">(</span>
        <span class="p">{</span>
            <span class="sh">"</span><span class="s">user_id</span><span class="sh">"</span><span class="p">:</span> <span class="n">user_id</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">memory_type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">semantic</span><span class="sh">"</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="n">content</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">embedding</span><span class="sh">"</span><span class="p">:</span> <span class="n">embedding</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">source_session_ids</span><span class="sh">"</span><span class="p">:</span> <span class="p">[</span><span class="n">source_session_id</span><span class="p">],</span>
            <span class="sh">"</span><span class="s">created_at</span><span class="sh">"</span><span class="p">:</span> <span class="n">datetime</span><span class="p">.</span><span class="nf">now</span><span class="p">(</span><span class="n">timezone</span><span class="p">.</span><span class="n">utc</span><span class="p">),</span>
        <span class="p">}</span>
    <span class="p">)</span>

<span class="k">def</span> <span class="nf">search_memories</span><span class="p">(</span><span class="n">user_id</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">query_embedding</span><span class="p">:</span> <span class="nb">list</span><span class="p">[</span><span class="nb">float</span><span class="p">]):</span>
    <span class="n">pipeline</span> <span class="o">=</span> <span class="p">[</span>
        <span class="p">{</span>
            <span class="sh">"</span><span class="s">$vectorSearch</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
                <span class="sh">"</span><span class="s">index</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">memory_vector_index</span><span class="sh">"</span><span class="p">,</span>
                <span class="sh">"</span><span class="s">path</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">embedding</span><span class="sh">"</span><span class="p">,</span>
                <span class="sh">"</span><span class="s">queryVector</span><span class="sh">"</span><span class="p">:</span> <span class="n">query_embedding</span><span class="p">,</span>
                <span class="sh">"</span><span class="s">numCandidates</span><span class="sh">"</span><span class="p">:</span> <span class="mi">100</span><span class="p">,</span>
                <span class="sh">"</span><span class="s">limit</span><span class="sh">"</span><span class="p">:</span> <span class="mi">5</span><span class="p">,</span>
                <span class="sh">"</span><span class="s">filter</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">user_id</span><span class="sh">"</span><span class="p">:</span> <span class="n">user_id</span><span class="p">,</span> <span class="sh">"</span><span class="s">memory_type</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">semantic</span><span class="sh">"</span><span class="p">},</span>
            <span class="p">}</span>
        <span class="p">},</span>
        <span class="p">{</span>
            <span class="sh">"</span><span class="s">$project</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
                <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
                <span class="sh">"</span><span class="s">memory_type</span><span class="sh">"</span><span class="p">:</span> <span class="mi">1</span><span class="p">,</span>
                <span class="sh">"</span><span class="s">score</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span><span class="sh">"</span><span class="s">$meta</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">vectorSearchScore</span><span class="sh">"</span><span class="p">},</span>
            <span class="p">}</span>
        <span class="p">},</span>
    <span class="p">]</span>
    <span class="k">return</span> <span class="nf">list</span><span class="p">(</span><span class="n">memories</span><span class="p">.</span><span class="nf">aggregate</span><span class="p">(</span><span class="n">pipeline</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This is a design sketch, not an official ADK adapter. The point is that ADK’s abstractions already describe a storage model MongoDB handles well: mutable session state, append-only events, and searchable long-term memory. Once a <code class="language-plaintext highlighter-rouge">MemoryService</code> exists, ADK’s built-in <a href="https://adk.dev/sessions/memory/"><code class="language-plaintext highlighter-rouge">PreloadMemory</code> and <code class="language-plaintext highlighter-rouge">LoadMemory</code> tools</a> can use it.</p>

<h2 id="dedicated-memory-layers-also-fit-mem0-on-mongodb">Dedicated memory layers also fit: Mem0 on MongoDB</h2>

<p>MongoDB is not only useful when memory is native to the agent framework. <a href="https://docs.mem0.ai/components/vectordbs/dbs/mongodb">Mem0’s MongoDB backend</a> supports MongoDB directly as a vector database for memory storage and retrieval. That matters because it shows MongoDB works both as the application database and as the substrate beneath a dedicated memory layer.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="rouge-code"><pre><span class="kn">from</span> <span class="n">mem0</span> <span class="kn">import</span> <span class="n">Memory</span>

<span class="n">config</span> <span class="o">=</span> <span class="p">{</span>
    <span class="sh">"</span><span class="s">vector_store</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
        <span class="sh">"</span><span class="s">provider</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">mongodb</span><span class="sh">"</span><span class="p">,</span>
        <span class="sh">"</span><span class="s">config</span><span class="sh">"</span><span class="p">:</span> <span class="p">{</span>
            <span class="sh">"</span><span class="s">db_name</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">mem0_db</span><span class="sh">"</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">collection_name</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">mem0_collection</span><span class="sh">"</span><span class="p">,</span>
            <span class="sh">"</span><span class="s">mongo_uri</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">&lt;connection-string&gt;</span><span class="sh">"</span><span class="p">,</span>
        <span class="p">},</span>
    <span class="p">}</span>
<span class="p">}</span>

<span class="n">memory</span> <span class="o">=</span> <span class="n">Memory</span><span class="p">.</span><span class="nf">from_config</span><span class="p">(</span><span class="n">config</span><span class="p">)</span>

<span class="n">messages</span> <span class="o">=</span> <span class="p">[</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">user</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">I</span><span class="sh">'</span><span class="s">m planning to watch a movie tonight.</span><span class="sh">"</span><span class="p">},</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">assistant</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">What genres do you like?</span><span class="sh">"</span><span class="p">},</span>
    <span class="p">{</span><span class="sh">"</span><span class="s">role</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">user</span><span class="sh">"</span><span class="p">,</span> <span class="sh">"</span><span class="s">content</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">I love sci-fi, not thrillers.</span><span class="sh">"</span><span class="p">},</span>
<span class="p">]</span>

<span class="n">memory</span><span class="p">.</span><span class="nf">add</span><span class="p">(</span><span class="n">messages</span><span class="p">,</span> <span class="n">user_id</span><span class="o">=</span><span class="sh">"</span><span class="s">alice</span><span class="sh">"</span><span class="p">,</span> <span class="n">metadata</span><span class="o">=</span><span class="p">{</span><span class="sh">"</span><span class="s">category</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">movies</span><span class="sh">"</span><span class="p">})</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The architectural value is the same as in LangGraph and LangChain: persistent memory objects, vector retrieval, and application data can live in one operational system instead of being spread across separate services.</p>

<h2 id="why-mongodb-is-the-best-choice-here">Why MongoDB is the best choice here</h2>

<p>MongoDB is the best choice when you want one system to hold agent state, long-term memory, retrieval data, and the application records the agent reasons over. The document model matches how current frameworks persist memory. <a href="https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/">Vector Search</a> and <a href="https://www.mongodb.com/docs/atlas/ai-integrations/langchain/hybrid-search/">Search</a> cover recall. <a href="https://www.mongodb.com/docs/manual/core/index-ttl/">TTL indexes</a> and <a href="https://www.mongodb.com/docs/manual/changestreams/">change streams</a> cover retention and event-driven memory extraction. <a href="https://www.mongodb.com/docs/atlas/ai-integrations/langchain/graph-rag/">GraphRAG</a> covers relationship-heavy data. The result is not “a vector store with extra features.” It is a memory layer that can also be the system of record. That is why MongoDB works as the brain of a modern AI application.</p>]]></content><author><name></name></author><category term="AI" /><category term="mongodb" /><category term="AI" /><category term="memory" /><category term="agents" /><summary type="html"><![CDATA[Production agents need two persistence layers: thread-scoped state and cross-session memory. Google ADK Sessions stores events and state for a single conversation, while MemoryService handles recall across sessions. LangGraph memory makes the same split with a checkpointer for short-term memory and a store for long-term memory, and LangChain long-term memory builds on LangGraph stores that persist JSON documents by namespace and key. The memory architecture has already converged.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.alexbevi.com/images/mongodb-brain-banner.png" /><media:content medium="image" url="https://www.alexbevi.com/images/mongodb-brain-banner.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Reverse Engineering Harvester with Ghidra and Codex - Part 6: Timers</title><link href="https://www.alexbevi.com/blog/2026/04/14/reverse-engineering-harvester-with-ghidra-and-codex-part-6-timers/" rel="alternate" type="text/html" title="Reverse Engineering Harvester with Ghidra and Codex - Part 6: Timers" /><published>2026-04-14T06:24:07-04:00</published><updated>2026-04-29T09:45:29-04:00</updated><id>https://www.alexbevi.com/blog/2026/04/14/reverse-engineering-harvester-with-ghidra-and-codex-part-6-timers</id><content type="html" xml:base="https://www.alexbevi.com/blog/2026/04/14/reverse-engineering-harvester-with-ghidra-and-codex-part-6-timers/"><![CDATA[<blockquote class="prompt-tip mb-6">
  <strong>Series:&nbsp;<a href="/blog/2026/03/14/reverse-engineering-a-dos-game-with-ghidra-and-codex/">Reverse Engineering Harvester</a></strong><p>This review is part of the Reverse Engineering Harvester series, where I document my journey of reverse engineering the 1996 DOS game Harvester to re-implement its game engine in ScummVM.</p><ul class="list-none space-y-1"><li>← <a href="/blog/2026/03/29/reverse-engineering-harvester-with-ghidra-and-codex-part-5-debugging-audio-issues/">Reverse Engineering Harvester with Ghidra and Codex - Part 5: Debugging Audio Issues</a></li><li><strong>Reverse Engineering Harvester with Ghidra and Codex - Part 6: Timers</strong></li></ul>

  <p>Article 6 of 6 in this series.</p>
</blockquote>

<style>
.content pre, .content pre code {
white-space: pre-wrap !important;
word-break: break-word !important;
overflow-wrap: break-word !important;
overflow-x: hidden !important;
}

.highlight {
overflow-x: visible !important;
}
</style>

<p>The Harvester game engine supports time-based functionality through the same scripting system that drives room transitions, object interactions, NPC state changes, and cutscenes.</p>

<p>While re-implementing the engine, one of the main challenges has been figuring out how to debug those systems in a way that gives future Codex prompts useful context. Logs are great after something fires, but timers have a different failure mode: you often need to know whether a timer exists, whether it is currently enabled, how much time is left, and what command chain it will run before it expires.</p>

<p>That is especially important in Harvester because timers are not just visual delays. They can damage the player, monsterfy NPCs, advance dialogue staging, unlock doors, and trigger authored room events.</p>

<h2 id="timer-records-in-harvestscr">Timer records in <code class="language-plaintext highlighter-rouge">HARVEST.SCR</code></h2>

<p>In <a href="/blog/2026/03/23/reverse-engineering-harvester-with-ghidra-and-codex-part-4-command-opcodes/">part 4</a> I covered the command-opcode side of the script. Timers sit beside those <code class="language-plaintext highlighter-rouge">COMMAND</code> records as their own world records. Once <code class="language-plaintext highlighter-rouge">HARVEST.SCR</code> is XOR-decoded, a timer looks like this:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>initialSeconds TIMER timerName roomName actionTag enabled looping global
</pre></td></tr></tbody></table></code></pre></div></div>

<p>For example:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>133 TIMER "ACID_TIMER3" "EYEHALL" "HURT_PC_ACIDA" "F" "F" "T"
266 TIMER "ACID_TIMER2" "EYEHALL" "HURT_PC_ACIDB" "F" "F" "T"
400 TIMER "ACID_TIMER"  "EYEHALL" "KILL_PC_ACID"  "F" "F" "T"
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The fields map cleanly to the runtime data structure:</p>

<table>
  <thead>
    <tr>
      <th>Field</th>
      <th>Meaning</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">initialSeconds</code></td>
      <td>Countdown length in seconds. The runtime schedules it against a centisecond clock by multiplying this value by <code class="language-plaintext highlighter-rouge">100</code>.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">timerName</code></td>
      <td>Stable lookup key. <code class="language-plaintext highlighter-rouge">SET_TIMER</code> and <code class="language-plaintext highlighter-rouge">KILL_TIMER</code> refer to this name.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">roomName</code></td>
      <td>Room/scope key. Room setup materializes timer entities whose room matches the current room.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">actionTag</code></td>
      <td>Command-chain entry point to execute when the timer expires.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">enabled</code></td>
      <td><code class="language-plaintext highlighter-rouge">T</code> if the timer starts enabled when the room is built.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">looping</code></td>
      <td><code class="language-plaintext highlighter-rouge">T</code> if the timer should restart after firing. The decoded script I checked currently uses one-shot timers.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">global</code></td>
      <td><code class="language-plaintext highlighter-rouge">T</code> if the live timer entity should be preserved across room transitions.</td>
    </tr>
  </tbody>
</table>

<p>The decoded script contains 76 real <code class="language-plaintext highlighter-rouge">TIMER</code> records. Most are initially disabled and are started by a nearby <code class="language-plaintext highlighter-rouge">COMMAND</code>, but a few start enabled as part of room setup.</p>

<p>At runtime, these timers are materialized as invisible runtime entities. That turned out to be an important detail: timer state is not only a field in the parsed script record. A live countdown exists in the room entity list, and global timers can survive room changes by preserving that live entity instead of destroying it with the rest of the room.</p>

<p>When a timer entity expires, the room loop records the expired timer name, resolves the backing <code class="language-plaintext highlighter-rouge">TimerRecord</code>, and dispatches the record’s <code class="language-plaintext highlighter-rouge">actionTag</code>. In other words, the timer name is just the lookup key. The action is whatever command chain is stored in the timer record.</p>

<h2 id="starting-and-stopping-timers">Starting and stopping timers</h2>

<p>Script command chains control timers with <code class="language-plaintext highlighter-rouge">SET_TIMER</code> and <code class="language-plaintext highlighter-rouge">KILL_TIMER</code>:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>COMMAND "SET_HALL_TIMER"   "SET_TIMER" "ACID_TIMER"  "ON" "" "KILL_TRIG_TIMER"
COMMAND "SET_HALL_2TIMER"  "SET_TIMER" "ACID_TIMER2" "ON" "" "SET_HALL_2TIMER2"
COMMAND "SET_HALL_2TIMER2" "SET_TIMER" "ACID_TIMER3" "ON" "" ""
</pre></td></tr></tbody></table></code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">SET_TIMER ... ON</code> enables a timer. If the timer was previously disabled, the current value is reset back to the initial value. <code class="language-plaintext highlighter-rouge">SET_TIMER ... OFF</code> and <code class="language-plaintext highlighter-rouge">KILL_TIMER</code> disable it.</p>

<p>The subtle part is that timer commands often live in longer command chains. They are not isolated API calls. Starting a timer might be one step in a room-entry sequence, and the timer’s expiry might run another command chain that branches on flags, shows text, adjusts HP, or changes rooms.</p>

<p><img src="/images/ghidra6/scummvm-harvester-00001.png" alt="" />
<em>In the <code class="language-plaintext highlighter-rouge">MAINHALL</code> with <code class="language-plaintext highlighter-rouge">DEBUG_TIMERS</code> and <code class="language-plaintext highlighter-rouge">DEBUG_ROOM</code> toggled</em></p>

<p>A good example is when you enter the <code class="language-plaintext highlighter-rouge">MAINHALL</code> on Disc 3, because it uses multiple global timers with staggered deadlines.</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre></td><td class="rouge-code"><pre>133 TIMER "ACID_TIMER3" "EYEHALL" "HURT_PC_ACIDA" "F" "F" "T"
266 TIMER "ACID_TIMER2" "EYEHALL" "HURT_PC_ACIDB" "F" "F" "T"
400 TIMER "ACID_TIMER"  "EYEHALL" "KILL_PC_ACID"  "F" "F" "T"

COMMAND "HURT_PC_ACIDA"   "KILL_TIMER" "ACID_TIMER3" "" "" "HURT_PC_ACID"
COMMAND "HURT_PC_ACIDB"   "KILL_TIMER" "ACID_TIMER3" "" "" "HURT_PC_ACID"
COMMAND "HURT_PC_ACID"    "CHECK_FLAG" "CLEANED_CLOTHES" "" "HURT_PC_ACID_1" ""
COMMAND "HURT_PC_ACID_1"  "SHOW_TEXT"  "ACID_TEXT2" "" "" "HURT_PC_ACID_2"
COMMAND "HURT_PC_ACID_2"  "ADJ_HP"     "-7" "" "" ""
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The first two timers are warning/damage stages. When they expire, they enter <code class="language-plaintext highlighter-rouge">HURT_PC_ACID</code>, which checks the <code class="language-plaintext highlighter-rouge">CLEANED_CLOTHES</code> flag. If the clothes have not been cleaned, the game shows acid text and subtracts 7 HP. If the flag has been set, the branch target is empty and the command chain stops.</p>

<p>The last timer, <code class="language-plaintext highlighter-rouge">ACID_TIMER</code>, is the hard fail path:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>400 TIMER "ACID_TIMER" "EYEHALL" "KILL_PC_ACID" "F" "F" "T"
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Cleaning the clothes is itself just another scripted interaction. Using the money on the cloakroom attendant starts <code class="language-plaintext highlighter-rouge">CLEAN_CLOTHES</code>, which shows text, sets the flag, removes the money, and disables the lethal acid timer:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre>USEITEM "BARCASHFIVE" "CLOAKROOM" "CLOAK_ATND" "CLEAN_CLOTHES"

COMMAND "CLEAN_CLOTHES"   "SHOW_TEXT" "CLEANED_CLOTHS" "" "" "CLEAN_CLOTHES_1"
COMMAND "CLEAN_CLOTHES_1" "SET_FLAG"  "CLEANED_CLOTHES" "T" "" "CLEAN_CLOTHES_2"
COMMAND "CLEAN_CLOTHES_2" "DELETE"    "CLOAKROOM" "BARCASHFIVE" "" "CLEAN_CLOTHES_3"
COMMAND "CLEAN_CLOTHES_3" "DELETE"    "INVENTORY" "BARCASHFIVE" "" "CLEAN_CLOTHES_4"
COMMAND "CLEAN_CLOTHES_4" "SET_TIMER" "ACID_TIMER" "OFF" "" "CLEAN_CLOTHES_5"
COMMAND "CLEAN_CLOTHES_5" "KILL_TIMER" "ACID_TIMER" "" "" ""
</pre></td></tr></tbody></table></code></pre></div></div>

<p>This is the kind of script graph that is hard to reason about from static records alone. Some of the authored data is also a little odd: both intermediate acid chains kill <code class="language-plaintext highlighter-rouge">ACID_TIMER3</code>, even though the second one is entered by <code class="language-plaintext highlighter-rouge">ACID_TIMER2</code>. Watching the live timers makes it much easier to tell whether that is harmless authored data, a reimplementation bug, or a broken state sync.</p>

<h2 id="adding-a-timer-overlay">Adding a timer overlay</h2>

<p>To make this easier to debug, I asked Codex to add a console command that would render active timers directly over the room:</p>

<blockquote class="prompt-tip">
  <p><em>the game manages timers periodically. I want to introduce a DEBUG_TIMERS command that when enabled it will overlay text on the screen that will be the timer name, starting value, current value, action to take when timer expires.</em></p>

  <p><em>If multiple timers are active, they should appear one after the other. Draw this near the middle-left of the screen. It should be white text on a black background</em></p>
</blockquote>

<p>Instead of wiring this up myself, Codex was able to do it in about 5 minutes.</p>

<p><img src="/images/ghidra6/prompt.png" alt="" /></p>

<p>The implementation is intentionally small:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">DEBUG_TIMERS</code> toggles a boolean on the Harvester engine.</li>
  <li>The room renderer checks that boolean after drawing the room and other debug overlays.</li>
  <li>The overlay walks the known timer records, finds matching live timer entities, filters out disabled timers, and formats each row as <code class="language-plaintext highlighter-rouge">name start=initial current=current action=tag</code>.</li>
  <li>Text is drawn near the middle-left of the screen using white text over a black rectangle.</li>
</ul>

<p>This means the overlay is showing live runtime state, not just parsed script data. If a timer is missing from the live entity list, disabled, or no longer counting down, it disappears.</p>

<p>The label format is deliberately boring:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>ACID_TIMER3 start=133 current=128 action=HURT_PC_ACIDA
ACID_TIMER2 start=266 current=261 action=HURT_PC_ACIDB
ACID_TIMER start=400 current=395 action=KILL_PC_ACID
</pre></td></tr></tbody></table></code></pre></div></div>

<p>That is exactly the information I need while debugging: what is active, how long it has left, and what will happen when it fires.</p>

<video width="640" height="480" controls="">
    <source src="/images/ghidra6/harvester-timers.mp4" type="video/mp4" />
    Your browser does not support the video tag
</video>

<p>Having a visual indicator for timer progress, and whether those timers were triggering the correct actions, made it much easier to discover some more obscure bugs.</p>

<p>These log lines are all coming from the game’s scripts. In this run, the clothes have been cleaned, so the acid timers still reach their action chains, but the <code class="language-plaintext highlighter-rouge">CLEANED_CLOTHES</code> flag prevents the HP penalty path from continuing.</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="rouge-code"><pre>Harvester: action tag 'CLEAN_CLOTHES' step=0 tag='CLEAN_CLOTHES' opcode='SHOW_TEXT' args=['CLEANED_CLOTHS','','','CLEAN_CLOTHES_1']
Harvester: action tag 'CLEAN_CLOTHES_1' step=0 tag='CLEAN_CLOTHES_1' opcode='SET_FLAG' args=['CLEANED_CLOTHES','T','','CLEAN_CLOTHES_2']
Harvester: action tag 'CLEAN_CLOTHES_1' SET_FLAG 'CLEANED_CLOTHES' 0 -&gt; 1 existed=1 changed=1
Harvester: action tag 'CLEAN_CLOTHES_1' step=1 tag='CLEAN_CLOTHES_2' opcode='DELETE' args=['CLOAKROOM','BARCASHFIVE','','CLEAN_CLOTHES_3']
Harvester: action tag 'CLEAN_CLOTHES_1' step=2 tag='CLEAN_CLOTHES_3' opcode='DELETE' args=['INVENTORY','BARCASHFIVE','','CLEAN_CLOTHES_4']
Harvester: action tag 'CLEAN_CLOTHES_1' step=3 tag='CLEAN_CLOTHES_4' opcode='SET_TIMER' args=['ACID_TIMER','OFF','','CLEAN_CLOTHES_5']
Harvester: action tag 'CLEAN_CLOTHES_1' step=4 tag='CLEAN_CLOTHES_5' opcode='KILL_TIMER' args=['ACID_TIMER','','','']

Harvester: timer command 'ACID_TIMER3' step=0 tag='HURT_PC_ACIDA' opcode='KILL_TIMER' args=['ACID_TIMER3','','','HURT_PC_ACID']
Harvester: timer command 'ACID_TIMER3' step=1 tag='HURT_PC_ACID' opcode='CHECK_FLAG' args=['CLEANED_CLOTHES','','HURT_PC_ACID_1','']
Harvester: timer command 'ACID_TIMER3' flag 'CLEANED_CLOTHES' -&gt; 1

Harvester: timer command 'ACID_TIMER2' step=0 tag='HURT_PC_ACIDB' opcode='KILL_TIMER' args=['ACID_TIMER3','','','HURT_PC_ACID']
Harvester: timer command 'ACID_TIMER2' step=1 tag='HURT_PC_ACID' opcode='CHECK_FLAG' args=['CLEANED_CLOTHES','','HURT_PC_ACID_1','']
Harvester: timer command 'ACID_TIMER2' flag 'CLEANED_CLOTHES' -&gt; 1
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The overlay was also useful outside this one hallway. Some NPC state changes are timer-driven too:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre>COMMAND "START_INQ_TIM" "SET_TIMER" "INQUIST_ATTACK_TIMER" "ON" "" ""
120 TIMER "INQUIST_ATTACK_TIMER" "PAIN" "MNSTFY_INQUIST" "F" "F" "F"

COMMAND "START_MERCY_TIMR" "SET_TIMER" "GLADIATOR_TIMER" "ON" "" ""
30 TIMER "GLADIATOR_TIMER" "MERCY" "MNST_GLAD" "F" "F" "F"
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Those are awkward to verify with logs alone because nothing visible happens until the timer expires. With <code class="language-plaintext highlighter-rouge">DEBUG_TIMERS</code> enabled, I can start the dialogue branch, see the countdown appear, wait for it to hit zero, and then check whether the expected monsterfy command chain ran.</p>

<p>That has become the general debugging pattern for this engine work: use Ghidra and the decoded script to understand the original data model, add small runtime instrumentation when the model is too indirect to observe comfortably, then feed the resulting logs, screenshots, and videos back into the next prompt.</p>]]></content><author><name></name></author><category term="Programming" /><category term="programming" /><category term="reverse-engineering" /><category term="scummvm" /><category term="ghidra" /><summary type="html"><![CDATA[Series:&nbsp;Reverse Engineering HarvesterThis review is part of the Reverse Engineering Harvester series, where I document my journey of reverse engineering the 1996 DOS game Harvester to re-implement its game engine in ScummVM.← Reverse Engineering Harvester with Ghidra and Codex - Part 5: Debugging Audio IssuesReverse Engineering Harvester with Ghidra and Codex - Part 6: Timers Article 6 of 6 in this series.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.alexbevi.com/images/ghidra1/harvester_reverse_engineering_banner_1200x600.png" /><media:content medium="image" url="https://www.alexbevi.com/images/ghidra1/harvester_reverse_engineering_banner_1200x600.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Reverse Engineering Harvester with Ghidra and Codex - Part 5: Debugging Audio Issues</title><link href="https://www.alexbevi.com/blog/2026/03/29/reverse-engineering-harvester-with-ghidra-and-codex-part-5-debugging-audio-issues/" rel="alternate" type="text/html" title="Reverse Engineering Harvester with Ghidra and Codex - Part 5: Debugging Audio Issues" /><published>2026-03-29T20:41:11-04:00</published><updated>2026-04-29T09:45:29-04:00</updated><id>https://www.alexbevi.com/blog/2026/03/29/reverse-engineering-harvester-with-ghidra-and-codex-part-5-debugging-audio-issues</id><content type="html" xml:base="https://www.alexbevi.com/blog/2026/03/29/reverse-engineering-harvester-with-ghidra-and-codex-part-5-debugging-audio-issues/"><![CDATA[<blockquote class="prompt-tip mb-6">
  <strong>Series:&nbsp;<a href="/blog/2026/03/14/reverse-engineering-a-dos-game-with-ghidra-and-codex/">Reverse Engineering Harvester</a></strong><p>This review is part of the Reverse Engineering Harvester series, where I document my journey of reverse engineering the 1996 DOS game Harvester to re-implement its game engine in ScummVM.</p><ul class="list-none space-y-1"><li>← <a href="/blog/2026/03/23/reverse-engineering-harvester-with-ghidra-and-codex-part-4-command-opcodes/">Reverse Engineering Harvester with Ghidra and Codex - Part 4: Command Opcodes</a></li><li><strong>Reverse Engineering Harvester with Ghidra and Codex - Part 5: Debugging Audio Issues</strong></li><li>→ <a href="/blog/2026/04/14/reverse-engineering-harvester-with-ghidra-and-codex-part-6-timers/">Reverse Engineering Harvester with Ghidra and Codex - Part 6: Timers</a></li></ul>

  <p>Article 5 of 6 in this series.</p>
</blockquote>

<style>
.content pre, .content pre code {
white-space: pre-wrap !important;
word-break: break-word !important;
overflow-wrap: break-word !important;
overflow-x: hidden !important;
}

.highlight {
overflow-x: visible !important;
}
</style>

<p>An issue I’ve noticed with the <a href="https://github.com/alexbevi/scummvm/tree/harvester"><code class="language-plaintext highlighter-rouge">harvester</code> engine</a> implementation is that every time there’s an audio sample played, it starts with a distinct “popping” sound. I thought this could be an issue with how I’d approached the re-implementation or disassembly, but after a few passes at decompiling the audio code I’d always end up with the same result.</p>

<video width="640" height="480" controls="">
    <source src="/images/ghidra5/harvester-popping.mp4" type="video/mp4" />
    Your browser does not support the video tag
</video>

<p>While doing some unrelated research I stumbled on <a href="https://codecs.multimedia.cx/">Kostya’s Boring Codec World</a> and a post he’d made on <a href="https://codecs.multimedia.cx/2021/01/a-look-on-futurevision-formats/">FutureVision formats</a>, discussing Harvester audio/video specifically:</p>

<blockquote>
  <p><em>“Music format is based on IMA ADPCM with minimalistic file header and the default predictor. Which means it needs some time to adapt to the actual coming signal amplitudes. In result the old format description based on reverse engineering recommends skipping first 7-57 bytes right after the header because of the garbage sound it produces. When I looked into the binary reference (an adventure that will be described in an upcoming post), it turned out that in certain cases they simply zero out the first 50 decoded samples.”</em></p>
</blockquote>

<p>Since our approach to do everything with Ghidra/Codex, I thought this would be a good opportunity to try just prompting GPT-5.4 with a bit more context to see what it would come back with. I tried feeding the following prompt as a result:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>based on https://codecs.multimedia.cx/2021/01/a-look-on-futurevision-formats/ "Music format is based on IMA ADPCM with minimalistic file header and the default predictor. Which means it needs some time to adapt to the actual coming signal amplitudes. In result the old format description based on reverse engineering recommends skipping first 7-57 bytes right after the header because of the garbage sound it produces. When I looked into the binary reference (an adventure that will be described in an upcoming post), it turned out that in certain cases they simply zero out the first 50 decoded samples."

Can we look into how realistic this may be based on what's in ghidra? The audio still produces a popping sound on each sample being played, so maybe this might help address that
</pre></td></tr></tbody></table></code></pre></div></div>

<p>After processing for a few minutes, the response back was promising:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="rouge-code"><pre>Yes. Based on live Ghidra, the article is not just plausible, it matches what HARVEST.LE actually does.

start_music_stream at 0x1bfc0 decodes the first FCMP chunk and then zeroes the first 0x64 decoded bytes before queueing it. On the common 16-bit streams, that is 50 samples.

load_sound_sample at 0x18470 and load_dialogue_voice_sample at 0x191d0 are even more aggressive: they queue from decoded_pcm + 0x1f4, so they discard the first 500 decoded bytes of one-shot/dialogue FCMP playback.
The current ScummVM Harvester decoder was doing neither, so a clip-start pop on CMP-backed audio is consistent with that mismatch.

I patched media_manager.cpp (line 127) to mirror the native split: music now mutes the first 100 decoded bytes, while startup speech/other FCMP playback trims the first 500 decoded bytes. WAV handling is unchanged. 
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">applyStartupFcmpWarmup</code> function that this prompt resulted in effectively addressed the issue we were experiencing.</p>

<video width="640" height="480" controls="">
    <source src="/images/ghidra5/harvester-less-popping.mp4" type="video/mp4" />
    Your browser does not support the video tag
</video>

<p>There’s still a slight pop on the first FST video being loaded, but all scene sound effects and dialogue interactions are now pop-free.</p>

<p>The patch is below to show how this was incorporated into the engine’s media manager.</p>

<div class="language-diff highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
</pre></td><td class="rouge-code"><pre><span class="gh">diff --git a/engines/harvester/media_manager.cpp b/engines/harvester/media_manager.cpp
index 691863468fa6..e953c0226cba 100644
</span><span class="gd">--- a/engines/harvester/media_manager.cpp
</span><span class="gi">+++ b/engines/harvester/media_manager.cpp
</span><span class="p">@@ -41,6 +41,11 @@</span> namespace Harvester {
 
 namespace {
 
<span class="gi">+enum class FcmpWarmupMode {
+	kMusic,
+	kSample
+};
+
</span> static const int8 kHarvesterImaIndexAdjustTable[16] = {
 	-1, -1, -1, -1, 2, 4, 6, 8,
 	-1, -1, -1, -1, 2, 4, 6, 8
<span class="p">@@ -119,7 +124,29 @@</span> static uint32 decodeHarvesterFcmp(byte *dest, const byte *src, uint32 srcSize, u
 	return decodedSize;
 }
 
<span class="gd">-static Audio::SeekableAudioStream *decodeStartupAudioStream(Common::SeekableReadStream *stream) {
</span><span class="gi">+static void applyStartupFcmpWarmup(byte *decodedPcm, uint32 &amp;decodedSize, FcmpWarmupMode warmupMode) {
+	if (!decodedPcm || decodedSize == 0)
+		return;
+
+	if (warmupMode == FcmpWarmupMode::kMusic) {
+		// Native start_music_stream zeros the first 0x64 decoded bytes of the first FCMP chunk.
+		memset(decodedPcm, 0, MIN&lt;uint32&gt;(decodedSize, 100));
+		return;
+	}
+
+	// Native load_sound_sample/load_dialogue_voice_sample queue from decoded_pcm + 0x1f4.
+	const uint32 trimBytes = MIN&lt;uint32&gt;(decodedSize, 500);
+	if (trimBytes == decodedSize) {
+		memset(decodedPcm, 0, decodedSize);
+		return;
+	}
+
+	memmove(decodedPcm, decodedPcm + trimBytes, decodedSize - trimBytes);
+	decodedSize -= trimBytes;
+}
+
+static Audio::SeekableAudioStream *decodeStartupAudioStream(Common::SeekableReadStream *stream,
+		FcmpWarmupMode warmupMode) {
</span> 	if (!stream)
 		return nullptr;
 
<span class="p">@@ -156,7 +183,7 @@</span> static Audio::SeekableAudioStream *decodeStartupAudioStream(Common::SeekableRead
 				return nullptr;
 			}
 
<span class="gd">-			const uint32 decodedSize = payloadSize * (bitsPerSample &gt;&gt; 2);
</span><span class="gi">+			uint32 decodedSize = payloadSize * (bitsPerSample &gt;&gt; 2);
</span> 			byte *decodedPcm = (byte *)malloc(decodedSize);
 			if (!decodedPcm) {
 				free(compressedPayload);
<span class="p">@@ -165,6 +192,7 @@</span> static Audio::SeekableAudioStream *decodeStartupAudioStream(Common::SeekableRead
 			}
 
 			decodeHarvesterFcmp(decodedPcm, compressedPayload, payloadSize, bitsPerSample);
<span class="gi">+			applyStartupFcmpWarmup(decodedPcm, decodedSize, warmupMode);
</span> 			free(compressedPayload);
 			delete stream;
 			return Audio::makeRawStream(decodedPcm, decodedSize, sampleRate,
<span class="p">@@ -177,8 +205,9 @@</span> static Audio::SeekableAudioStream *decodeStartupAudioStream(Common::SeekableRead
 	return Audio::makeWAVStream(stream, DisposeAfterUse::YES);
 }
 
<span class="gd">-static Audio::SeekableAudioStream *openStartupAudioStream(ResourceManager &amp;resources, const Common::String &amp;path) {
-	return decodeStartupAudioStream(resources.openFile(path));
</span><span class="gi">+static Audio::SeekableAudioStream *openStartupAudioStream(ResourceManager &amp;resources,
+		const Common::String &amp;path, FcmpWarmupMode warmupMode) {
+	return decodeStartupAudioStream(resources.openFile(path), warmupMode);
</span> }
 
 } // End of anonymous namespace
<span class="p">@@ -268,7 +297,8 @@</span> bool MediaManager::playMusic(const Common::String &amp;path) {
 		return true;
 	}
 
<span class="gd">-	Audio::SeekableAudioStream *audioStream = openStartupAudioStream(_resources, path);
</span><span class="gi">+	Audio::SeekableAudioStream *audioStream =
+		openStartupAudioStream(_resources, path, FcmpWarmupMode::kMusic);
</span> 	if (!audioStream) {
 		warning("Harvester: unable to decode startup music '%s'", path.c_str());
 		return false;
<span class="p">@@ -323,7 +353,8 @@</span> bool MediaManager::playSound(const Common::String &amp;path) {
 	_soundSlotIndex = (_soundSlotIndex + 1) % ARRAYSIZE(_soundHandles);
 	stopSoundHandle(_soundHandles[_soundSlotIndex]);
 
<span class="gd">-	Audio::SeekableAudioStream *audioStream = openStartupAudioStream(_resources, path);
</span><span class="gi">+	Audio::SeekableAudioStream *audioStream =
+		openStartupAudioStream(_resources, path, FcmpWarmupMode::kSample);
</span> 	if (!audioStream) {
 		warning("Harvester: unable to decode startup sound '%s'", path.c_str());
 		return false;
<span class="p">@@ -340,7 +371,8 @@</span> bool MediaManager::playSingleSound(const Common::String &amp;path) {
 		return false;
 
 	stopSoundHandle(_singleSoundHandle);
<span class="gd">-	Audio::SeekableAudioStream *audioStream = openStartupAudioStream(_resources, path);
</span><span class="gi">+	Audio::SeekableAudioStream *audioStream =
+		openStartupAudioStream(_resources, path, FcmpWarmupMode::kSample);
</span> 	if (!audioStream) {
 		warning("Harvester: unable to decode startup sound '%s'", path.c_str());
 		return false;
<span class="p">@@ -364,7 +396,8 @@</span> bool MediaManager::playSpeech(const Common::String &amp;path) {
 		return false;
 
 	stopSoundHandle(_speechHandle);
<span class="gd">-	Audio::SeekableAudioStream *audioStream = openStartupAudioStream(_resources, path);
</span><span class="gi">+	Audio::SeekableAudioStream *audioStream =
+		openStartupAudioStream(_resources, path, FcmpWarmupMode::kSample);
</span> 	if (!audioStream) {
 		warning("Harvester: unable to decode startup speech '%s'", path.c_str());
 		return false;
<span class="p">@@ -410,7 +443,7 @@</span> bool MediaManager::playLoadedSound(int slot) {
 
 	Common::SeekableReadStream *stream = new Common::MemoryReadStream(
 		_loadedSoundData[slot].data(), _loadedSoundData[slot].size(), DisposeAfterUse::NO);
<span class="gd">-	Audio::SeekableAudioStream *audioStream = decodeStartupAudioStream(stream);
</span><span class="gi">+	Audio::SeekableAudioStream *audioStream = decodeStartupAudioStream(stream, FcmpWarmupMode::kSample);
</span> 	if (!audioStream) {
 		warning("Harvester: unable to decode startup sound slot %d ('%s')",
 			slot, _loadedSoundPaths[slot].c_str());
</pre></td></tr></tbody></table></code></pre></div></div>]]></content><author><name></name></author><category term="Programming" /><category term="programming" /><category term="reverse-engineering" /><category term="scummvm" /><category term="ghidra" /><summary type="html"><![CDATA[Series:&nbsp;Reverse Engineering HarvesterThis review is part of the Reverse Engineering Harvester series, where I document my journey of reverse engineering the 1996 DOS game Harvester to re-implement its game engine in ScummVM.← Reverse Engineering Harvester with Ghidra and Codex - Part 4: Command OpcodesReverse Engineering Harvester with Ghidra and Codex - Part 5: Debugging Audio Issues→ Reverse Engineering Harvester with Ghidra and Codex - Part 6: Timers Article 5 of 6 in this series.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.alexbevi.com/images/ghidra1/harvester_reverse_engineering_banner_1200x600.png" /><media:content medium="image" url="https://www.alexbevi.com/images/ghidra1/harvester_reverse_engineering_banner_1200x600.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Reverse Engineering Harvester with Ghidra and Codex - Part 4: Command Opcodes</title><link href="https://www.alexbevi.com/blog/2026/03/23/reverse-engineering-harvester-with-ghidra-and-codex-part-4-command-opcodes/" rel="alternate" type="text/html" title="Reverse Engineering Harvester with Ghidra and Codex - Part 4: Command Opcodes" /><published>2026-03-23T19:35:39-04:00</published><updated>2026-04-29T09:45:29-04:00</updated><id>https://www.alexbevi.com/blog/2026/03/23/reverse-engineering-harvester-with-ghidra-and-codex-part-4-command-opcodes</id><content type="html" xml:base="https://www.alexbevi.com/blog/2026/03/23/reverse-engineering-harvester-with-ghidra-and-codex-part-4-command-opcodes/"><![CDATA[<blockquote class="prompt-tip mb-6">
  <strong>Series:&nbsp;<a href="/blog/2026/03/14/reverse-engineering-a-dos-game-with-ghidra-and-codex/">Reverse Engineering Harvester</a></strong><p>This review is part of the Reverse Engineering Harvester series, where I document my journey of reverse engineering the 1996 DOS game Harvester to re-implement its game engine in ScummVM.</p><ul class="list-none space-y-1"><li>← <a href="/blog/2026/03/23/reverse-engineering-harvester-with-ghidra-and-codex-part-3-file-formats/">Reverse Engineering Harvester with Ghidra and Codex - Part 3: File Formats</a></li><li><strong>Reverse Engineering Harvester with Ghidra and Codex - Part 4: Command Opcodes</strong></li><li>→ <a href="/blog/2026/03/29/reverse-engineering-harvester-with-ghidra-and-codex-part-5-debugging-audio-issues/">Reverse Engineering Harvester with Ghidra and Codex - Part 5: Debugging Audio Issues</a></li></ul>

  <p>Article 4 of 6 in this series.</p>
</blockquote>

<style>
.content pre, .content pre code {
white-space: pre-wrap !important;
word-break: break-word !important;
overflow-wrap: break-word !important;
overflow-x: hidden !important;
}

.highlight {
overflow-x: visible !important;
}
</style>

<p>Harvester’s startup / world script is not bytecode. It is XOR-obfuscated text, and opcode dispatch happens through <code class="language-plaintext highlighter-rouge">COMMAND</code> records in <code class="language-plaintext highlighter-rouge">HARVEST.SCR</code>:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>COMMAND triggerTag opcodeName arg1 arg2 arg3 [arg4]
</pre></td></tr></tbody></table></code></pre></div></div>

<p>In the original game and in ScummVM, these opcode names come from the data pipeline, not from a compiled bytecode table:</p>

<ul>
  <li>In ScummVM, <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L408-L463"><code class="language-plaintext highlighter-rouge">Script::load()</code></a> reads <code class="language-plaintext highlighter-rouge">HARVEST.SCR</code>, <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L589-L596"><code class="language-plaintext highlighter-rouge">decode()</code></a> XOR-deobfuscates it, and <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L598-L755"><code class="language-plaintext highlighter-rouge">parseTownRecords()</code></a> turns it into typed startup records.</li>
  <li>Within <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L598-L755"><code class="language-plaintext highlighter-rouge">parseTownRecords()</code></a>, <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L654-L676"><code class="language-plaintext highlighter-rouge">COMMAND</code> lines are parsed into <code class="language-plaintext highlighter-rouge">StartupCommandRecord</code> entries</a> alongside <code class="language-plaintext highlighter-rouge">ROOM</code>, <code class="language-plaintext highlighter-rouge">OBJECT</code>, <code class="language-plaintext highlighter-rouge">REGION</code>, <code class="language-plaintext highlighter-rouge">TIMER</code>, <code class="language-plaintext highlighter-rouge">USEITEM</code>, and related world records.</li>
  <li>Those other records provide the entry labels for command chains through room setup/exit, interactions, and timers.</li>
  <li><a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L1635-L1644"><code class="language-plaintext highlighter-rouge">findCommandRecord()</code></a> and <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L2091-L2528"><code class="language-plaintext highlighter-rouge">executeCommandChain()</code></a> resolve the current label to a <code class="language-plaintext highlighter-rouge">COMMAND</code> record, decode the opcode name on that line, and dispatch into engine handlers for room flow, media, inventory, actor state, and other subsystems.</li>
</ul>

<h2 id="command-labels">Command Labels</h2>

<h3 id="triggertag"><code class="language-plaintext highlighter-rouge">triggerTag</code></h3>

<p><code class="language-plaintext highlighter-rouge">triggerTag</code> is the label attached to one <code class="language-plaintext highlighter-rouge">COMMAND</code> record. It is the string used to find that record later.</p>

<ul>
  <li>The parser stores it from the token immediately after <code class="language-plaintext highlighter-rouge">COMMAND</code> in <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L654-L676"><code class="language-plaintext highlighter-rouge">parseTownRecords()</code></a>.</li>
  <li><a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L1635-L1644"><code class="language-plaintext highlighter-rouge">Script::findCommandRecord()</code></a> resolves a command by comparing the requested tag string against <code class="language-plaintext highlighter-rouge">command.triggerTag</code>.</li>
</ul>

<p>So <code class="language-plaintext highlighter-rouge">triggerTag</code> is not a condition and not an opcode argument in the behavioral sense. It is the command node’s name.</p>

<h3 id="currenttag"><code class="language-plaintext highlighter-rouge">currentTag</code></h3>

<p><code class="language-plaintext highlighter-rouge">currentTag</code> is the interpreter’s working variable while it walks a command chain.</p>

<ul>
  <li><a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L2091-L2528"><code class="language-plaintext highlighter-rouge">executeCommandChain()</code></a> initializes <code class="language-plaintext highlighter-rouge">currentTag</code> from the caller-supplied starting tag.</li>
  <li>It then resolves the current command with <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L1635-L1644"><code class="language-plaintext highlighter-rouge">findCommandRecord(currentTag)</code></a>.</li>
  <li>After each opcode runs, <code class="language-plaintext highlighter-rouge">currentTag</code> is updated to the next label:
    <ul>
      <li>branch opcodes like <code class="language-plaintext highlighter-rouge">CHECK_FLAG</code> and <code class="language-plaintext highlighter-rouge">CHECK_PERC</code> set it from <code class="language-plaintext highlighter-rouge">arg2</code> or <code class="language-plaintext highlighter-rouge">arg3</code></li>
      <li>most linear opcodes continue to <code class="language-plaintext highlighter-rouge">arg4</code></li>
      <li>deferred opcodes may stash <code class="language-plaintext highlighter-rouge">arg4</code> as a continuation tag and return to the caller instead of immediately continuing</li>
    </ul>
  </li>
</ul>

<p>If you think of the script as a graph, <code class="language-plaintext highlighter-rouge">triggerTag</code> is the node name stored in the file, and <code class="language-plaintext highlighter-rouge">currentTag</code> is the interpreter’s current node pointer.</p>

<h2 id="where-starting-tags-come-from">Where Starting Tags Come From</h2>

<p>The interpreter does not enter command chains automatically just because a <code class="language-plaintext highlighter-rouge">COMMAND</code> record exists. Some other game record must point at its label.</p>

<p>Common entry points in the current engine:</p>

<ul>
  <li>object interaction uses <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L1386-L1413"><code class="language-plaintext highlighter-rouge">object.actionTag</code></a></li>
  <li>region interaction uses <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L1416-L1432"><code class="language-plaintext highlighter-rouge">region.actionTag</code></a></li>
  <li>use-item interaction uses <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L1435-L1450"><code class="language-plaintext highlighter-rouge">useItem.actionTag</code></a></li>
  <li>room enter / exit uses <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L1048-L1061"><code class="language-plaintext highlighter-rouge">room-&gt;onEnterCommand</code></a> and <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L1369-L1383"><code class="language-plaintext highlighter-rouge">room-&gt;onExitCommand</code></a></li>
  <li>timer execution starts from <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L1474-L1502"><code class="language-plaintext highlighter-rouge">timer-&gt;arg2</code></a></li>
</ul>

<p>That means a more precise reading of the format is:</p>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>COMMAND label opcodeName arg1 arg2 arg3 [nextLabel]
</pre></td></tr></tbody></table></code></pre></div></div>

<p>with the caveat that <code class="language-plaintext highlighter-rouge">arg2</code>, <code class="language-plaintext highlighter-rouge">arg3</code>, and <code class="language-plaintext highlighter-rouge">arg4</code> are opcode-specific, so only some of them are actually labels for a given opcode.</p>

<h2 id="examples">Examples</h2>

<h3 id="example-1-straight-line-chain">Example 1: straight-line chain</h3>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre>COMMAND "OPEN_GATE" "SET_FLAG" "GATE_OPEN" "T" "" "OPEN_GATE_TEXT"
COMMAND "OPEN_GATE_TEXT" "SHOW_TEXT" "Gate_Is_Open" "" "" ""
</pre></td></tr></tbody></table></code></pre></div></div>

<p>If an object’s <code class="language-plaintext highlighter-rouge">actionTag</code> is <code class="language-plaintext highlighter-rouge">"OPEN_GATE"</code>:</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">executeCommandChain()</code> starts with <code class="language-plaintext highlighter-rouge">currentTag = "OPEN_GATE"</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">findCommandRecord("OPEN_GATE")</code> resolves the first line because its <code class="language-plaintext highlighter-rouge">triggerTag</code> is <code class="language-plaintext highlighter-rouge">"OPEN_GATE"</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">SET_FLAG</code> runs and then sets <code class="language-plaintext highlighter-rouge">currentTag = arg4</code>, which is <code class="language-plaintext highlighter-rouge">"OPEN_GATE_TEXT"</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">findCommandRecord("OPEN_GATE_TEXT")</code> resolves the second line.</li>
  <li><code class="language-plaintext highlighter-rouge">SHOW_TEXT</code> runs. Because it is deferred, the interpreter returns control to the caller instead of continuing immediately.</li>
</ol>

<h3 id="example-2-branch-on-a-flag">Example 2: branch on a flag</h3>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>COMMAND "TRY_SHED" "CHECK_FLAG" "HAS_SHED_KEY" "SHED_OPEN" "SHED_LOCKED"
COMMAND "SHED_OPEN" "CHANGE_ROOM" "SHED_INT" "" "" ""
COMMAND "SHED_LOCKED" "SHOW_TEXT" "Need_A_Key" "" "" ""
</pre></td></tr></tbody></table></code></pre></div></div>

<p>If the chain starts at <code class="language-plaintext highlighter-rouge">"TRY_SHED"</code>:</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">currentTag</code> starts as <code class="language-plaintext highlighter-rouge">"TRY_SHED"</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">CHECK_FLAG</code> looks up <code class="language-plaintext highlighter-rouge">HAS_SHED_KEY</code>.</li>
  <li>If the flag is true, <code class="language-plaintext highlighter-rouge">currentTag</code> becomes <code class="language-plaintext highlighter-rouge">arg2</code>, so the next lookup is <code class="language-plaintext highlighter-rouge">"SHED_OPEN"</code>.</li>
  <li>If the flag is false, <code class="language-plaintext highlighter-rouge">currentTag</code> becomes <code class="language-plaintext highlighter-rouge">arg3</code>, so the next lookup is <code class="language-plaintext highlighter-rouge">"SHED_LOCKED"</code>.</li>
</ol>

<p>So here the first <code class="language-plaintext highlighter-rouge">COMMAND</code> line is acting like a named branch node.</p>

<h3 id="example-3-deferred-opcode-with-continuation">Example 3: deferred opcode with continuation</h3>

<div class="language-text highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre>COMMAND "POTTS_EVENT" "GOFLIC" "GRAPHIC/FST/C001B.FST" "" "" "POTTS_AFTER_MOVIE"
COMMAND "POTTS_AFTER_MOVIE" "SET_FLAG" "STEPH_MIDGAME_PLAYED" "T" "" ""
</pre></td></tr></tbody></table></code></pre></div></div>

<p>When <code class="language-plaintext highlighter-rouge">currentTag</code> reaches <code class="language-plaintext highlighter-rouge">"POTTS_EVENT"</code>:</p>

<ol>
  <li><code class="language-plaintext highlighter-rouge">GOFLIC</code> does not immediately jump to <code class="language-plaintext highlighter-rouge">"POTTS_AFTER_MOVIE"</code>.</li>
  <li>Instead, it stores <code class="language-plaintext highlighter-rouge">arg4</code> as a continuation tag and returns the movie request to the caller.</li>
  <li>After the cutscene finishes, room/dialogue code can resume by starting another command-chain execution at <code class="language-plaintext highlighter-rouge">"POTTS_AFTER_MOVIE"</code>.</li>
</ol>

<p>That is why <code class="language-plaintext highlighter-rouge">arg4</code> is often best read as “the next tag after this opcode completes”, not just “the next line”.</p>

<p>Most of the opcode recognition below lives in <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L2091-L2528"><code class="language-plaintext highlighter-rouge">Script::executeCommandChain()</code></a>, while deferred outputs such as modal text, dialogue continuations, lighting changes, player moves, and follow-up tags are consumed by the room interaction processor in <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/room.cpp#L922-L1236"><code class="language-plaintext highlighter-rouge">room.cpp</code></a>.</p>

<h2 id="control-flow-and-transitions">Control Flow And Transitions</h2>

<table>
  <thead>
    <tr>
      <th>Opcode</th>
      <th>Args used</th>
      <th>Effect</th>
      <th>Status / notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">CHANGE_CD</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=cdNumber</code></td>
      <td>Change CD</td>
      <td>Not Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">CHECK_FLAG</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=flagName</code>, <code class="language-plaintext highlighter-rouge">arg2=trueTag</code>, <code class="language-plaintext highlighter-rouge">arg3=falseTag</code></td>
      <td>Branches on the current runtime value of a flag. Missing flags read as false.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">CHECK_PERC</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=threshold</code>, <code class="language-plaintext highlighter-rouge">arg2=trueTag</code>, <code class="language-plaintext highlighter-rouge">arg3=falseTag</code></td>
      <td>Rolls <code class="language-plaintext highlighter-rouge">0..99</code> and branches on <code class="language-plaintext highlighter-rouge">roll &lt; threshold</code>. Threshold is clamped to <code class="language-plaintext highlighter-rouge">0..100</code>.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">EXEC_LIST</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=listName</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Runs each entry tag in an <code class="language-plaintext highlighter-rouge">EXEC_LIST</code> record until one produces deferred output, then stops. Otherwise continues to <code class="language-plaintext highlighter-rouge">arg4</code>.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">START_DIALOG</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=npcName</code>, <code class="language-plaintext highlighter-rouge">arg4=continuationTag</code></td>
      <td>Defers into the room/dialogue system and resumes at <code class="language-plaintext highlighter-rouge">arg4</code> after the dialogue finishes.</td>
      <td>Implemented with caveat: if no dialogue context is supplied, the interpreter logs an unsupported-command message and aborts the current chain.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">GOFLIC</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=cutscenePath</code>, <code class="language-plaintext highlighter-rouge">arg4=continuationTag</code></td>
      <td>Defers a cutscene and stores <code class="language-plaintext highlighter-rouge">arg4</code> as the continuation tag to run after the movie.</td>
      <td>Implemented with caveat: if no cutscene output slot is provided, the interpreter logs and continues to <code class="language-plaintext highlighter-rouge">arg4</code> without playing a movie.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">GODEATHFLIC</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=cutscenePath</code></td>
      <td>Defers a death movie and requests a return to the main menu.</td>
      <td>Implemented with caveat: requires menu-exit context. Without it, the interpreter logs an unsupported-command message and aborts the current chain. If transitions are disabled, it logs a skipped transition and returns.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">CLOSEUP</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=targetName</code></td>
      <td>Requests a nested room / closeup transition.</td>
      <td>Implemented with caveat: if transitions are disabled, the opcode is skipped and the chain ends immediately.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">CHANGE_ROOM</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=targetName</code></td>
      <td>Requests a room handoff. In room gameplay, this queues the next room instead of nesting immediately.</td>
      <td>Implemented with caveat: if transitions are disabled, the opcode is skipped and the chain ends immediately.</td>
    </tr>
  </tbody>
</table>

<h2 id="world-and-runtime-state">World And Runtime State</h2>

<table>
  <thead>
    <tr>
      <th>Opcode</th>
      <th>Args used</th>
      <th>Effect</th>
      <th>Status / notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">SET_FLAG</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=flagName</code>, <code class="language-plaintext highlighter-rouge">arg2=value</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Creates or updates a runtime flag, then continues to <code class="language-plaintext highlighter-rouge">arg4</code>.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">SPOOL_MUSIC</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=musicPath</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Defers a startup music change.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">ADD</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=ownerOrRoom</code>, <code class="language-plaintext highlighter-rouge">arg2=objectName</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Makes an object visible by setting <code class="language-plaintext highlighter-rouge">visible</code> and <code class="language-plaintext highlighter-rouge">runtimeVisible</code> true.</td>
      <td>Implemented. This is a visibility toggle, not an ownership transfer.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">DELETE</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=ownerOrRoom</code>, <code class="language-plaintext highlighter-rouge">arg2=objectName</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Makes an object invisible by setting <code class="language-plaintext highlighter-rouge">visible</code> and <code class="language-plaintext highlighter-rouge">runtimeVisible</code> false.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">ADD2INV</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=objectName</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Moves an object into <code class="language-plaintext highlighter-rouge">INVENTORY</code>, makes it visible, and marks it identified.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">SET_ANIM</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=animName</code>, <code class="language-plaintext highlighter-rouge">arg2=active</code>, <code class="language-plaintext highlighter-rouge">arg3=visible</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Updates a runtime animation’s active / visible state.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">SET_REGION</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=regionName</code>, <code class="language-plaintext highlighter-rouge">arg2=enabledFlag</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Toggles <code class="language-plaintext highlighter-rouge">startEnabled</code> on a region. Any <code class="language-plaintext highlighter-rouge">arg2</code> other than <code class="language-plaintext highlighter-rouge">F</code> enables the region.</td>
      <td>Implemented with caveat: this does not touch <code class="language-plaintext highlighter-rouge">cursorEnabled</code>.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">SET_NPC</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=npcName</code>, <code class="language-plaintext highlighter-rouge">arg2=active</code>, <code class="language-plaintext highlighter-rouge">arg3=visible</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Updates a runtime NPC’s active / visible state.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">SET_MONSTER</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=monsterName</code>, <code class="language-plaintext highlighter-rouge">arg2=active</code>, <code class="language-plaintext highlighter-rouge">arg3=visible</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Updates a runtime monster’s active / visible state.</td>
      <td>Implemented with nuance: activating a monster forces visibility on and restores HP if the monster was dead.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">SET_TIMER</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=timerName</code>, <code class="language-plaintext highlighter-rouge">arg2=ON/OFF</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Enables or disables a timer. When enabling, resets <code class="language-plaintext highlighter-rouge">currentValue</code> to <code class="language-plaintext highlighter-rouge">initialValue</code>.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">KILL_TIMER</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=timerName</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Disables a timer.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">KILL_NPC</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=npcName</code>, <code class="language-plaintext highlighter-rouge">arg2=damageType</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Marks an NPC as dead / removed and optionally records damage type <code class="language-plaintext highlighter-rouge">BLUDGE</code>, <code class="language-plaintext highlighter-rouge">SLASH</code>, or <code class="language-plaintext highlighter-rouge">PROJ</code>.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">MONSTERFY</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=npcName</code>, <code class="language-plaintext highlighter-rouge">arg2=damageType</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Uses the same death/monsterfy flagging path as <code class="language-plaintext highlighter-rouge">KILL_NPC</code>, and also activates the NPC’s linked monster target when one exists.</td>
      <td>Implemented</td>
    </tr>
  </tbody>
</table>

<h2 id="player-and-ui">Player And UI</h2>

<table>
  <thead>
    <tr>
      <th>Opcode</th>
      <th>Args used</th>
      <th>Effect</th>
      <th>Status / notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">SHOW_TEXT</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=textKey</code>, <code class="language-plaintext highlighter-rouge">arg4=continuationTag</code></td>
      <td>Resolves a <code class="language-plaintext highlighter-rouge">TEXT</code> record and defers modal text display.</td>
      <td>Implemented with caveat: rendering currently requires <code class="language-plaintext highlighter-rouge">BOX1..BOX4</code>. Unknown text boxes log and do not display.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">HEAL_PC</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=delta</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Adds <code class="language-plaintext highlighter-rouge">arg1</code> to current player HP, clamped to <code class="language-plaintext highlighter-rouge">0..30</code>.</td>
      <td>Implemented. Current code treats this as the same operation as <code class="language-plaintext highlighter-rouge">ADJ_HP</code>.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">ADJ_HP</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=delta</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Adds <code class="language-plaintext highlighter-rouge">arg1</code> to current player HP, clamped to <code class="language-plaintext highlighter-rouge">0..30</code>.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">KILL_PC</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Sets player HP to <code class="language-plaintext highlighter-rouge">0</code>.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">PAUSE_PC</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Sets the runtime player-control-paused flag.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">RESUME_PC</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Clears the runtime player-control-paused flag.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">PC_GOTO_XZ</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=x</code>, <code class="language-plaintext highlighter-rouge">arg2=z</code>, <code class="language-plaintext highlighter-rouge">arg4=continuationTag</code></td>
      <td>Defers a player reposition request in room space.</td>
      <td>Implemented with caveat: if no player-move consumer is present, the interpreter logs and continues to <code class="language-plaintext highlighter-rouge">arg4</code> without moving the player.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">CHANGE_LIGHTING</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=mode</code>, <code class="language-plaintext highlighter-rouge">arg4=continuationTag</code></td>
      <td>Defers a lighting command. Supported parsed modes are <code class="language-plaintext highlighter-rouge">DIM</code>, <code class="language-plaintext highlighter-rouge">NORMAL</code>, <code class="language-plaintext highlighter-rouge">NONE</code>, and <code class="language-plaintext highlighter-rouge">FADE_IN</code>.</td>
      <td>Implemented with caveats: <code class="language-plaintext highlighter-rouge">NONE</code> maps to a black-screen command, not a no-op. <code class="language-plaintext highlighter-rouge">FADE_IN</code> is recognized but has no direct room-side effect yet. If no lighting consumer is present, the interpreter logs and continues.</td>
    </tr>
  </tbody>
</table>

<h2 id="audio">Audio</h2>

<p>The audio opcodes all share the same queueing path through <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/script.cpp#L309-L352"><code class="language-plaintext highlighter-rouge">appendStartupAudioCommand()</code></a> and are later handed off to <a href="https://github.com/alexbevi/scummvm/blob/0add95aa7714a90f1db930ad47609188775c7db9/engines/harvester/flow.cpp#L1896-L1899"><code class="language-plaintext highlighter-rouge">Flow::executeStartupAudioCommands()</code></a>.</p>

<table>
  <thead>
    <tr>
      <th>Opcode</th>
      <th>Args used</th>
      <th>Effect</th>
      <th>Status / notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">START_WAV</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=path</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Plays a sound effect on one of eight rotating SFX handles.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">START_SINGLE_WAV</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=path</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Plays a sound effect on one dedicated “single” SFX handle, replacing the prior one.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">LOAD_WAV</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=path</code>, <code class="language-plaintext highlighter-rouge">arg2=slot</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Loads a sound into a persistent slot for later playback.</td>
      <td>Implemented. Valid loaded-sound slots are <code class="language-plaintext highlighter-rouge">0..3</code>.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">PLAY_WAV</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=slot</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Plays a sound previously loaded by <code class="language-plaintext highlighter-rouge">LOAD_WAV</code>.</td>
      <td>Implemented</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">DELETE_WAV</code></td>
      <td><code class="language-plaintext highlighter-rouge">arg1=slot</code>, <code class="language-plaintext highlighter-rouge">arg4=nextTag</code></td>
      <td>Deletes a sound previously loaded by <code class="language-plaintext highlighter-rouge">LOAD_WAV</code>.</td>
      <td>Implemented</td>
    </tr>
  </tbody>
</table>

<h2 id="observed-engine-side-aliases-and-shared-paths">Observed Engine-Side Aliases And Shared Paths</h2>

<ul>
  <li><code class="language-plaintext highlighter-rouge">HEAL_PC</code> and <code class="language-plaintext highlighter-rouge">ADJ_HP</code> currently share the exact same implementation.</li>
  <li><code class="language-plaintext highlighter-rouge">KILL_NPC</code> and <code class="language-plaintext highlighter-rouge">MONSTERFY</code> share the same base handler; <code class="language-plaintext highlighter-rouge">MONSTERFY</code> additionally activates the linked monster target when present.</li>
  <li><code class="language-plaintext highlighter-rouge">CLOSEUP</code> and <code class="language-plaintext highlighter-rouge">CHANGE_ROOM</code> share the same transition-output path, differing only in the transition kind reported to room logic.</li>
</ul>]]></content><author><name></name></author><category term="Programming" /><category term="programming" /><category term="reverse-engineering" /><category term="scummvm" /><category term="ghidra" /><summary type="html"><![CDATA[Series:&nbsp;Reverse Engineering HarvesterThis review is part of the Reverse Engineering Harvester series, where I document my journey of reverse engineering the 1996 DOS game Harvester to re-implement its game engine in ScummVM.← Reverse Engineering Harvester with Ghidra and Codex - Part 3: File FormatsReverse Engineering Harvester with Ghidra and Codex - Part 4: Command Opcodes→ Reverse Engineering Harvester with Ghidra and Codex - Part 5: Debugging Audio Issues Article 4 of 6 in this series.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.alexbevi.com/images/ghidra1/harvester_reverse_engineering_banner_1200x600.png" /><media:content medium="image" url="https://www.alexbevi.com/images/ghidra1/harvester_reverse_engineering_banner_1200x600.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Reverse Engineering Harvester with Ghidra and Codex - Part 3: File Formats</title><link href="https://www.alexbevi.com/blog/2026/03/23/reverse-engineering-harvester-with-ghidra-and-codex-part-3-file-formats/" rel="alternate" type="text/html" title="Reverse Engineering Harvester with Ghidra and Codex - Part 3: File Formats" /><published>2026-03-23T06:18:10-04:00</published><updated>2026-04-29T09:45:29-04:00</updated><id>https://www.alexbevi.com/blog/2026/03/23/reverse-engineering-harvester-with-ghidra-and-codex-part-3-file-formats</id><content type="html" xml:base="https://www.alexbevi.com/blog/2026/03/23/reverse-engineering-harvester-with-ghidra-and-codex-part-3-file-formats/"><![CDATA[<blockquote class="prompt-tip mb-6">
  <strong>Series:&nbsp;<a href="/blog/2026/03/14/reverse-engineering-a-dos-game-with-ghidra-and-codex/">Reverse Engineering Harvester</a></strong><p>This review is part of the Reverse Engineering Harvester series, where I document my journey of reverse engineering the 1996 DOS game Harvester to re-implement its game engine in ScummVM.</p><ul class="list-none space-y-1"><li>← <a href="/blog/2026/03/17/reverse-engineering-harvester-with-ghidra-and-codex-part-2/">Reverse Engineering Harvester with Ghidra and Codex - Part 2</a></li><li><strong>Reverse Engineering Harvester with Ghidra and Codex - Part 3: File Formats</strong></li><li>→ <a href="/blog/2026/03/23/reverse-engineering-harvester-with-ghidra-and-codex-part-4-command-opcodes/">Reverse Engineering Harvester with Ghidra and Codex - Part 4: Command Opcodes</a></li></ul>

  <p>Article 3 of 6 in this series.</p>
</blockquote>

<style>
.content pre, .content pre code {
white-space: pre-wrap !important;
word-break: break-word !important;
overflow-wrap: break-word !important;
overflow-x: hidden !important;
}

.highlight {
overflow-x: visible !important;
}
</style>

<h1 id="file-formats">File Formats</h1>

<p>This document tracks file-format facts that are confirmed by the current Harvester reverse-engineering work. It is intentionally conservative: if a field or behavior is not supported by code or verified runtime notes, it is left out.</p>

<p>The sections below combine three sources of evidence: the current ScummVM Harvester engine code, named native functions and data types in Ghidra, and spot-checks against extracted sample files from the Harvester CD image. Unknown or only partially understood fields stay marked as such.</p>

<h2 id="dat-xfile-archive-payload"><code class="language-plaintext highlighter-rouge">DAT</code> (XFILE archive payload)</h2>

<p>Harvester’s <code class="language-plaintext highlighter-rouge">.DAT</code> files act as payload containers for the game’s numbered XFILE resource sets. The important wrinkle is that the <code class="language-plaintext highlighter-rouge">.DAT</code> file is not the whole format by itself: the loader depends on a sidecar <code class="language-plaintext highlighter-rouge">INDEX.00N</code> file for the directory. In other words, the <code class="language-plaintext highlighter-rouge">.DAT</code> holds the bytes, while the <code class="language-plaintext highlighter-rouge">INDEX.00N</code> records tell the engine where each member starts, how large it is, and whether it must be unpacked after reading.</p>

<p>At confirmed cold start, the original game mounts these pairs in this order:</p>

<table>
  <thead>
    <tr>
      <th>Set</th>
      <th>Index file</th>
      <th>Data file</th>
      <th>Notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td><code class="language-plaintext highlighter-rouge">INDEX.001</code></td>
      <td><code class="language-plaintext highlighter-rouge">HARVEST.DAT</code></td>
      <td>First archive-backed set mounted during startup. Confirmed startup art and UI assets are loaded through this set.</td>
    </tr>
    <tr>
      <td>2</td>
      <td><code class="language-plaintext highlighter-rouge">INDEX.002</code></td>
      <td><code class="language-plaintext highlighter-rouge">SOUND.DAT</code></td>
      <td>Second numbered set mounted during startup.</td>
    </tr>
    <tr>
      <td>3</td>
      <td><code class="language-plaintext highlighter-rouge">INDEX.003</code></td>
      <td><code class="language-plaintext highlighter-rouge">HARVEST2.DAT</code></td>
      <td>Third numbered set mounted during startup.</td>
    </tr>
  </tbody>
</table>

<h3 id="summary">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>Archive payload file used by the XFILE resource layer</td>
    </tr>
    <tr>
      <td>Companion metadata</td>
      <td>Required sidecar <code class="language-plaintext highlighter-rouge">INDEX.00N</code> directory file</td>
    </tr>
    <tr>
      <td>Archive-wide header in <code class="language-plaintext highlighter-rouge">.DAT</code></td>
      <td>No confirmed archive-wide header or in-file directory has been required by the current loader</td>
    </tr>
    <tr>
      <td>Addressing model</td>
      <td>Members are looked up by path in <code class="language-plaintext highlighter-rouge">INDEX.00N</code>, then read from absolute offsets in the <code class="language-plaintext highlighter-rouge">.DAT</code></td>
    </tr>
    <tr>
      <td>Path syntax in game code</td>
      <td>Archive-backed lookups use <code class="language-plaintext highlighter-rouge">&lt;set-number&gt;:\path\to\file.ext</code></td>
    </tr>
    <tr>
      <td>Endianness</td>
      <td>Mixed in the sidecar index: signature is checked as big-endian <code class="language-plaintext highlighter-rouge">XFLE</code>; numeric fields are little-endian 32-bit values</td>
    </tr>
    <tr>
      <td>Compression</td>
      <td>Optional per-entry packing; unpacked entries are read directly, packed entries are expanded after read</td>
    </tr>
  </tbody>
</table>

<h3 id="dat-payload-layout"><code class="language-plaintext highlighter-rouge">.DAT</code> payload layout</h3>

<table>
  <thead>
    <tr>
      <th>Offset</th>
      <th>Size</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00</code></td>
      <td>variable</td>
      <td>Raw member data addressed by the sidecar index. The current loader treats the file as a byte reservoir and does not require a confirmed global header before opening entries.</td>
    </tr>
  </tbody>
</table>

<p>This is sparse compared with many archive formats. The format’s structure is directory-driven rather than self-describing. All meaningful per-file metadata currently comes from the matching <code class="language-plaintext highlighter-rouge">INDEX.00N</code> file.</p>

<h3 id="index00n-directory-record-layout"><code class="language-plaintext highlighter-rouge">INDEX.00N</code> directory record layout</h3>

<p>Each confirmed sidecar index is read as a flat array of <code class="language-plaintext highlighter-rouge">0x94</code>-byte records:</p>

<table>
  <thead>
    <tr>
      <th>Offset</th>
      <th>Size</th>
      <th>Type</th>
      <th>Name</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>ASCII</td>
      <td><code class="language-plaintext highlighter-rouge">signature</code></td>
      <td><code class="language-plaintext highlighter-rouge">XFLE</code> magic. The loader rejects records whose first four bytes do not match this tag.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x80</code></td>
      <td>char[128]</td>
      <td><code class="language-plaintext highlighter-rouge">path</code></td>
      <td>NUL-terminated resource path string. This is the logical member name used for later lookups.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x84</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">archive_offset</code></td>
      <td>Absolute offset of the member data inside the companion <code class="language-plaintext highlighter-rouge">.DAT</code> file.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x88</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">stored_size</code></td>
      <td>Number of bytes stored in the <code class="language-plaintext highlighter-rouge">.DAT</code> for this member.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x8c</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">packed_flag</code></td>
      <td><code class="language-plaintext highlighter-rouge">0</code> means the member is stored verbatim. Any nonzero value takes the packed-entry decode path. The current loader does not distinguish between different nonzero flag values.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x90</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">unpacked_size</code></td>
      <td>Expected output size after decode for packed entries.</td>
    </tr>
  </tbody>
</table>

<h3 id="packed-entry-stream">Packed entry stream</h3>

<p>When <code class="language-plaintext highlighter-rouge">packed_flag != 0</code>, the loader reads <code class="language-plaintext highlighter-rouge">stored_size</code> bytes from the <code class="language-plaintext highlighter-rouge">.DAT</code> and expands them into an <code class="language-plaintext highlighter-rouge">unpacked_size</code> output buffer using a simple control-byte stream:</p>

<table>
  <thead>
    <tr>
      <th>Control byte range</th>
      <th>Meaning</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00</code> to <code class="language-plaintext highlighter-rouge">0x80</code></td>
      <td>Copy the next <code class="language-plaintext highlighter-rouge">control</code> bytes literally into the output stream.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x81</code> to <code class="language-plaintext highlighter-rouge">0xff</code></td>
      <td>Read one byte and repeat it <code class="language-plaintext highlighter-rouge">control - 0x80</code> times.</td>
    </tr>
  </tbody>
</table>

<p>The decode loop stops once either the compressed input is exhausted or the output buffer reaches <code class="language-plaintext highlighter-rouge">unpacked_size</code>. Entries that fail to produce exactly <code class="language-plaintext highlighter-rouge">unpacked_size</code> bytes are treated as invalid.</p>

<h3 id="path-handling-notes">Path handling notes</h3>

<table>
  <thead>
    <tr>
      <th>Behavior</th>
      <th>Detail</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Separators</td>
      <td>Resource paths are normalized from DOS-style backslashes to forward slashes during lookup.</td>
    </tr>
    <tr>
      <td>Prefix stripping</td>
      <td>Leading <code class="language-plaintext highlighter-rouge">./</code>, leading <code class="language-plaintext highlighter-rouge">/</code>, and <code class="language-plaintext highlighter-rouge">&lt;digit&gt;:/</code> prefixes are stripped during normalization.</td>
    </tr>
    <tr>
      <td>Case handling</td>
      <td>Archive member lookups are case-insensitive in the current implementation.</td>
    </tr>
    <tr>
      <td>Loose files vs archive paths</td>
      <td>Bare relative paths are handled separately by the direct-file path builder; <code class="language-plaintext highlighter-rouge">&lt;digit&gt;:\...</code> paths select an archive set instead.</td>
    </tr>
  </tbody>
</table>

<h3 id="practical-interpretation">Practical interpretation</h3>

<p>For extraction or tooling work, the current evidence supports treating Harvester’s <code class="language-plaintext highlighter-rouge">.DAT</code> archives as a two-file format:</p>

<ol>
  <li>Read the matching <code class="language-plaintext highlighter-rouge">INDEX.00N</code> file as a sequence of <code class="language-plaintext highlighter-rouge">0x94</code>-byte <code class="language-plaintext highlighter-rouge">XFLE</code> records.</li>
  <li>Use each record’s <code class="language-plaintext highlighter-rouge">archive_offset</code> and <code class="language-plaintext highlighter-rouge">stored_size</code> to slice bytes from the companion <code class="language-plaintext highlighter-rouge">.DAT</code>.</li>
  <li>If <code class="language-plaintext highlighter-rouge">packed_flag</code> is zero, the slice is the final file.</li>
  <li>If <code class="language-plaintext highlighter-rouge">packed_flag</code> is nonzero, expand the slice with the literal/repeat decoder above until <code class="language-plaintext highlighter-rouge">unpacked_size</code> bytes are produced.</li>
</ol>

<p>That model is enough to explain the startup resource mounts already confirmed in the reverse-engineering notes, and it matches the current ScummVM-side archive loader.</p>

<h2 id="rcs-quick-tip-text-lists"><code class="language-plaintext highlighter-rouge">RCS</code> (Quick-tip text lists)</h2>

<p>The only confirmed <code class="language-plaintext highlighter-rouge">.RCS</code> use so far is <code class="language-plaintext highlighter-rouge">ADJHEAD.RCS</code>, the quick-tips file shown by the startup/options overlay. Both native <code class="language-plaintext highlighter-rouge">run_quick_tips_screen</code> and the ScummVM <code class="language-plaintext highlighter-rouge">Flow::loadQuickTips</code> path treat it as plain text rather than as a binary container.</p>

<h3 id="summary-1">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>Plaintext list of quick-tip strings</td>
    </tr>
    <tr>
      <td>Confirmed consumer</td>
      <td>Startup quick-tips overlay and options-menu quick-tips screen</td>
    </tr>
    <tr>
      <td>Encoding</td>
      <td>Plain ASCII text in the sampled file</td>
    </tr>
    <tr>
      <td>Header</td>
      <td>None</td>
    </tr>
    <tr>
      <td>Delimiters</td>
      <td>CR/LF line endings in the sampled file; loaders split on <code class="language-plaintext highlighter-rouge">\n</code> and ignore <code class="language-plaintext highlighter-rouge">\r</code></td>
    </tr>
  </tbody>
</table>

<h3 id="file-layout">File layout</h3>

<table>
  <thead>
    <tr>
      <th>Offset / unit</th>
      <th>Size</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00..EOF</code></td>
      <td>variable</td>
      <td>Sequence of text lines. Each non-empty trimmed line becomes one quick-tip string.</td>
    </tr>
  </tbody>
</table>

<h3 id="notes">Notes</h3>

<ul>
  <li>The sampled <code class="language-plaintext highlighter-rouge">ADJHEAD.RCS</code> begins directly with readable text: <code class="language-plaintext highlighter-rouge">Double click the left mouse button...</code>.</li>
  <li>The native quick-tips path advances through the file with <code class="language-plaintext highlighter-rouge">read_line_from_file_stream</code>, wraps back to the start on EOF, and chooses a random starting point the first time it runs.</li>
  <li>This is a good example of Harvester’s resource pipeline staying pragmatic: not every gameplay-facing asset is wrapped in a custom binary format.</li>
</ul>

<h2 id="dialogrsp-dialogue-response-and-keyword-text"><code class="language-plaintext highlighter-rouge">DIALOG.RSP</code> (Dialogue response and keyword text)</h2>

<p><code class="language-plaintext highlighter-rouge">DIALOG.RSP</code> is a plaintext line table used by the dialogue UI. It does not appear to contain ids, offsets, or a binary header; the engine indexes it by zero-based line number and then uses the line text directly.</p>

<h3 id="summary-2">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>Plaintext dialogue response / keyword string table</td>
    </tr>
    <tr>
      <td>Confirmed consumers</td>
      <td><code class="language-plaintext highlighter-rouge">load_dialogue_response_line</code>, response menu, keyword menu</td>
    </tr>
    <tr>
      <td>Encoding</td>
      <td>Plain ASCII text in the sampled file</td>
    </tr>
    <tr>
      <td>Header</td>
      <td>None</td>
    </tr>
    <tr>
      <td>Addressing model</td>
      <td>Zero-based line index</td>
    </tr>
  </tbody>
</table>

<h3 id="file-layout-1">File layout</h3>

<table>
  <thead>
    <tr>
      <th>Unit</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Line <code class="language-plaintext highlighter-rouge">n</code></td>
      <td>One response, keyword label, or menu text string</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">/</code> within a line</td>
      <td>In response/keyword UI contexts, splits one source line into multiple visible menu items</td>
    </tr>
  </tbody>
</table>

<h3 id="notes-1">Notes</h3>

<ul>
  <li>The native loader reads a requested zero-based line, strips the trailing CR/LF, and returns the shared text buffer directly.</li>
  <li><code class="language-plaintext highlighter-rouge">run_dialogue_response_menu</code> and <code class="language-plaintext highlighter-rouge">run_dialogue_keyword_menu</code> split visible options on <code class="language-plaintext highlighter-rouge">/</code>; there is no confirmed hidden topic-id layer behind those labels.</li>
  <li><code class="language-plaintext highlighter-rouge">load_dialogue_index</code> also pulls zero-based line <code class="language-plaintext highlighter-rouge">13</code> from <code class="language-plaintext highlighter-rouge">DIALOG.RSP</code> to seed the default keyword topic; in the sampled file that line is <code class="language-plaintext highlighter-rouge">BYE</code> (one-based line 14).</li>
  <li><code class="language-plaintext highlighter-rouge">DIALOG.RSP</code> shows Harvester leaning on designer-editable text tables even inside a fairly custom dialogue stack.</li>
</ul>

<h2 id="dialogueidx-subtitle-index"><code class="language-plaintext highlighter-rouge">DIALOGUE.IDX</code> (Subtitle index)</h2>

<p><code class="language-plaintext highlighter-rouge">DIALOGUE.IDX</code> is an XOR-obfuscated text file that maps voice sample ids to subtitle strings. The native <code class="language-plaintext highlighter-rouge">load_dialogue_index</code> path and the ScummVM <code class="language-plaintext highlighter-rouge">Text::loadDialogueIndex</code> reimplementation both decode it with XOR <code class="language-plaintext highlighter-rouge">0xAA</code> while leaving CR/LF intact.</p>

<h3 id="summary-3">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>Voice-id to subtitle-text index</td>
    </tr>
    <tr>
      <td>Encoding on disk</td>
      <td>ASCII text obfuscated with XOR <code class="language-plaintext highlighter-rouge">0xAA</code> on every byte except CR/LF</td>
    </tr>
    <tr>
      <td>Header</td>
      <td>None</td>
    </tr>
    <tr>
      <td>Confirmed delimiters</td>
      <td><code class="language-plaintext highlighter-rouge">NUL</code>, <code class="language-plaintext highlighter-rouge">LF</code>, <code class="language-plaintext highlighter-rouge">CR</code>, and form-feed (<code class="language-plaintext highlighter-rouge">0x0c</code>)</td>
    </tr>
    <tr>
      <td>Key native type</td>
      <td><code class="language-plaintext highlighter-rouge">DialogueIndexEntry { wav_id, text_offset, text_length }</code></td>
    </tr>
  </tbody>
</table>

<h3 id="decoded-stream-layout">Decoded stream layout</h3>

<table>
  <thead>
    <tr>
      <th>Sequence element</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>ASCII decimal token</td>
      <td><code class="language-plaintext highlighter-rouge">wav_id</code> for one spoken line</td>
    </tr>
    <tr>
      <td>Delimiter run</td>
      <td>One or more of <code class="language-plaintext highlighter-rouge">NUL</code>, <code class="language-plaintext highlighter-rouge">LF</code>, <code class="language-plaintext highlighter-rouge">CR</code>, <code class="language-plaintext highlighter-rouge">FF</code></td>
    </tr>
    <tr>
      <td>ASCII text token</td>
      <td>Subtitle text for that voice id</td>
    </tr>
    <tr>
      <td>Delimiter run</td>
      <td>Ends the subtitle record and starts the next id</td>
    </tr>
  </tbody>
</table>

<h3 id="derived-index-entry">Derived index entry</h3>

<table>
  <thead>
    <tr>
      <th>Field</th>
      <th>Meaning</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">wav_id</code></td>
      <td>Positive decimal voice/sample id</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">text_offset</code></td>
      <td>Byte offset of the decoded subtitle text inside the decoded blob</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">text_length</code></td>
      <td>Subtitle length, clamped to <code class="language-plaintext highlighter-rouge">0x19a</code> bytes by the native loader</td>
    </tr>
  </tbody>
</table>

<h3 id="notes-2">Notes</h3>

<ul>
  <li>After XOR decode, the sampled file starts with a simple alternating pattern: <code class="language-plaintext highlighter-rouge">1</code>, <code class="language-plaintext highlighter-rouge">"junk init"</code>, <code class="language-plaintext highlighter-rouge">7</code>, <code class="language-plaintext highlighter-rouge">"Yes?"</code>, <code class="language-plaintext highlighter-rouge">11</code>, <code class="language-plaintext highlighter-rouge">"I need some help..."</code>, and so on.</li>
  <li>The native loader builds a 3000-entry in-memory table and keeps the decoded text blob around so <code class="language-plaintext highlighter-rouge">play_dialogue_line</code> can seek back into it cheaply.</li>
  <li>Architecturally, this splits dialogue cleanly in two: <code class="language-plaintext highlighter-rouge">DIALOGUE.IDX</code> carries spoken-line subtitles keyed by numeric wav ids, while <code class="language-plaintext highlighter-rouge">DIALOG.RSP</code> carries menu-facing response text.</li>
</ul>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre><span class="n">python3</span> <span class="o">-</span><span class="n">c</span> <span class="sh">'</span><span class="s">import sys,pathlib,signal; signal.signal(signal.SIGPIPE, signal.SIG_DFL); d=pathlib.Path(sys.argv[1]).read_bytes(); sys.stdout.buffer.write(bytes(b if b in (10,13) else b ^ 0xAA for b in d))</span><span class="sh">'</span> <span class="n">DIALOGUE</span><span class="p">.</span><span class="n">IDX</span> <span class="o">|</span> <span class="n">head</span> <span class="o">-</span><span class="n">n</span> <span class="mi">10</span>
<span class="mi">1</span>
<span class="sh">"</span><span class="s">junk init</span><span class="sh">"</span>
<span class="mi">7</span>
<span class="sh">"</span><span class="s">Yes?</span><span class="sh">"</span>
<span class="mi">11</span>
<span class="sh">"</span><span class="s">I need some help... Mister...?</span><span class="sh">"</span>
<span class="mi">15</span>
<span class="sh">"</span><span class="s">Postmaster Boyle.  What can I do you for today?</span><span class="sh">"</span>
<span class="mi">25</span>
<span class="sh">"</span><span class="s">Sorry, youngster, we</span><span class="sh">'</span><span class="s">re out of applications right now.</span><span class="sh">"</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<h2 id="scr-townworld-startup-script"><code class="language-plaintext highlighter-rouge">SCR</code> (Town/world startup script)</h2>

<p><code class="language-plaintext highlighter-rouge">HARVEST.SCR</code> is the central world-definition and startup-script file. It is XOR-obfuscated text, not bytecode. The native <code class="language-plaintext highlighter-rouge">load_xor_obfuscated_town_script</code> path and the ScummVM <code class="language-plaintext highlighter-rouge">Script::decode</code> path both XOR each non-CR/LF byte with <code class="language-plaintext highlighter-rouge">0xAA</code>, then parse whitespace-separated records with quoted strings preserved.</p>

<h3 id="summary-4">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>World database, startup configuration script, and command graph</td>
    </tr>
    <tr>
      <td>Encoding on disk</td>
      <td>ASCII text obfuscated with XOR <code class="language-plaintext highlighter-rouge">0xAA</code> on every byte except CR/LF</td>
    </tr>
    <tr>
      <td>Header</td>
      <td>None</td>
    </tr>
    <tr>
      <td>Comment syntax</td>
      <td>Lines whose first non-space character is <code class="language-plaintext highlighter-rouge">{</code> are skipped</td>
    </tr>
    <tr>
      <td>Tokenization</td>
      <td>Whitespace-delimited tokens with quoted strings kept intact</td>
    </tr>
  </tbody>
</table>

<h3 id="confirmed-record-forms">Confirmed record forms</h3>

<table>
  <thead>
    <tr>
      <th>Tag</th>
      <th>Leading numeric fields</th>
      <th>Fields after the tag</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">ENTRANCE</code></td>
      <td><code class="language-plaintext highlighter-rouge">x y z</code></td>
      <td><code class="language-plaintext highlighter-rouge">direction roomName entranceName</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">MAP_ENTRANCE</code></td>
      <td><code class="language-plaintext highlighter-rouge">mapX mapY initialPanelIndex</code></td>
      <td><code class="language-plaintext highlighter-rouge">entryName</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">MAP_LOCATION</code></td>
      <td><code class="language-plaintext highlighter-rouge">minX minY maxX maxY panelIndex labelX labelY</code></td>
      <td><code class="language-plaintext highlighter-rouge">labelText destinationEntranceName</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">ROOM</code></td>
      <td><code class="language-plaintext highlighter-rouge">minZ maxZ maxZScreenY minZScreenY fullScaleZ maxZScalePercent</code></td>
      <td><code class="language-plaintext highlighter-rouge">roomName musicPath reservedString38 reservedString3c reservedString40 palettePath dimmable onEnterCommand onExitCommand</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">OBJECT</code></td>
      <td><code class="language-plaintext highlighter-rouge">initialX initialY boundsX2 boundsY2 initialZ zExtent</code></td>
      <td><code class="language-plaintext highlighter-rouge">initialOwnerOrRoom objectName spritePath altSpritePath reservedString40 inventoryTextKey reservedXFlag identTextKey operatable visible actionTag interactionLabel</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">ANIM</code></td>
      <td><code class="language-plaintext highlighter-rouge">x y z frameDelay</code></td>
      <td><code class="language-plaintext highlighter-rouge">roomName resourcePath animName active visible looping backward pingPong remove</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">NPC</code></td>
      <td><code class="language-plaintext highlighter-rouge">x y z frameDelay</code></td>
      <td><code class="language-plaintext highlighter-rouge">roomName modelPath npcName monsterfyTargetName active visible onDeathActionTag audioPath entityInitArg</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">MONSTER</code></td>
      <td><code class="language-plaintext highlighter-rouge">x y z engageDistance initialHitPoints damageAmount attackSoundTriggerFrame hitSoundTriggerFrame footstepSoundTriggerFrame deathSoundTriggerFrame</code></td>
      <td>Sample data and native analysis confirm those first six combat columns map to <code class="language-plaintext highlighter-rouge">MonsterRecord.engage_distance</code>, <code class="language-plaintext highlighter-rouge">initial_hit_points</code>, and <code class="language-plaintext highlighter-rouge">damage_amount</code>, followed by the four sound-trigger timing columns. The intermediate string columns at offsets <code class="language-plaintext highlighter-rouge">0x38</code>, <code class="language-plaintext highlighter-rouge">0x3c</code>, <code class="language-plaintext highlighter-rouge">0x44</code>, and <code class="language-plaintext highlighter-rouge">0x48</code> remain reserved in current data and have no recovered read-side consumers.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">REGION</code></td>
      <td><code class="language-plaintext highlighter-rouge">left top right bottom minZ maxZ</code></td>
      <td><code class="language-plaintext highlighter-rouge">regionName direction roomName actionTag startEnabled cursorEnabled</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">FLAG</code></td>
      <td>none</td>
      <td><code class="language-plaintext highlighter-rouge">name value</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">COMMAND</code></td>
      <td>none</td>
      <td><code class="language-plaintext highlighter-rouge">triggerTag opcodeName arg1 arg2 arg3 [arg4]</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">TEXT</code></td>
      <td>none</td>
      <td><code class="language-plaintext highlighter-rouge">key boxName value</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">HEAD</code></td>
      <td>none</td>
      <td><code class="language-plaintext highlighter-rouge">headId portraitPath</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">USEITEM</code></td>
      <td>none</td>
      <td><code class="language-plaintext highlighter-rouge">itemName ownerOrRoom targetName actionTag</code></td>
    </tr>
  </tbody>
</table>

<h3 id="notes-3">Notes</h3>

<ul>
  <li>The sampled file starts with readable structural records immediately after XOR decode, for example <code class="language-plaintext highlighter-rouge">ENTRANCE</code>, <code class="language-plaintext highlighter-rouge">OBJECT</code>, and later <code class="language-plaintext highlighter-rouge">ROOM</code>, <code class="language-plaintext highlighter-rouge">TEXT</code>, <code class="language-plaintext highlighter-rouge">HEAD</code>, and <code class="language-plaintext highlighter-rouge">COMMAND</code>.</li>
  <li>Paths embedded in script records are not uniform: archive-backed resources use <code class="language-plaintext highlighter-rouge">1:\...</code>, <code class="language-plaintext highlighter-rouge">2:\...</code>, <code class="language-plaintext highlighter-rouge">3:\...</code>, while some direct-file assets are bare relative paths such as <code class="language-plaintext highlighter-rouge">dialogue.idx</code>.</li>
  <li><code class="language-plaintext highlighter-rouge">GOFLIC</code> and <code class="language-plaintext highlighter-rouge">GODEATHFLIC</code> are especially revealing command names. In the sampled script, those opcodes point to <code class="language-plaintext highlighter-rouge">.FST</code> paths, which suggests the script vocabulary preserved older naming while the shipping runtime movie path used FST files.</li>
  <li><code class="language-plaintext highlighter-rouge">HARVEST.SCR</code> shows that Harvester’s resource architecture is data-driven at the top level: rooms, objects, dialogue portraits, commands, music, palettes, and cutscene triggers all converge here.</li>
</ul>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="rouge-code"><pre>python3 <span class="nt">-c</span> <span class="s1">'import sys,pathlib,signal; signal.signal(signal.SIGPIPE, signal.SIG_DFL); d=pathlib.Path(sys.argv[1]).read_bytes(); sys.stdout.buffer.write(bytes(b if b in (10,13) else b ^ 0xAA for b in d))'</span> HARVEST.SCR | <span class="nb">head</span> <span class="nt">-n</span> 10
<span class="o">{</span>// HARVESTER <span class="o">(</span>c<span class="o">)</span> 1995-96 Scripting Language
<span class="o">{</span>// Town
        0   0   0   ENTRANCE <span class="s2">"BACK"</span> <span class="s2">"TOWN_2_LODGE"</span> <span class="s2">"SERGEANT_2_LODGE"</span>
        0   0   0   ENTRANCE <span class="s2">"FRONT"</span> <span class="s2">""</span> <span class="s2">"SAVE_GAME"</span>
        492 67  554 96   0  1 OBJECT <span class="s2">"INVENTORY"</span> <span class="s2">"INV_EXIT"</span> <span class="s2">""</span> <span class="s2">""</span> <span class="s2">""</span> <span class="s2">""</span> <span class="s2">""</span> <span class="s2">"X"</span> <span class="s2">"F"</span> <span class="s2">"T"</span> <span class="s2">""</span> <span class="s2">"Inventory"</span>
        299 0   0   0    2  1 OBJECT <span class="s2">"NULL_ID"</span>          <span class="s2">"EXIT_BM"</span>  <span class="s2">"1:</span><span class="se">\G</span><span class="s2">RAPHIC</span><span class="se">\O</span><span class="s2">THER</span><span class="se">\E</span><span class="s2">XITSIGN.BM"</span> <span class="s2">""</span> <span class="s2">""</span> <span class="s2">""</span> <span class="s2">""</span>  <span class="s2">""</span> <span class="s2">"F"</span> <span class="s2">"T"</span> <span class="s2">""</span> <span class="s2">"exit"</span>
        299 0   375 61   0  1 OBJECT <span class="s2">"NULL_ID"</span>          <span class="s2">"EXIT_HS"</span>  <span class="s2">""</span>                             <span class="s2">""</span> <span class="s2">""</span> <span class="s2">""</span> <span class="s2">"X"</span> <span class="s2">""</span> <span class="s2">"T"</span> <span class="s2">"T"</span> <span class="s2">""</span> <span class="s2">"exit"</span>

<span class="o">{</span>// inventory health indicator
     72 314 0   0  <span class="nt">-12</span>  1 OBJECT <span class="s2">"INVENTORY"</span> <span class="s2">"INV_STAT1"</span> <span class="s2">"1:</span><span class="se">\g</span><span class="s2">raphic</span><span class="se">\o</span><span class="s2">ther</span><span class="se">\h</span><span class="s2">ead-a1.bm"</span> <span class="s2">""</span> <span class="s2">""</span> <span class="s2">"INV_STAT_ST"</span> <span class="s2">""</span> <span class="s2">""</span> <span class="s2">"F"</span> <span class="s2">"F"</span> <span class="s2">""</span> <span class="s2">"Health_Indicator"</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<h2 id="bm-static-indexed-bitmap"><code class="language-plaintext highlighter-rouge">BM</code> (Static indexed bitmap)</h2>

<p><code class="language-plaintext highlighter-rouge">.BM</code> is Harvester’s simplest custom image format: a small fixed header followed by raw 8-bit indexed pixels. It is used for UI panels, portraits, inventory art, help screens, and other non-animated images.</p>

<h3 id="summary-5">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>Raw 8-bit indexed bitmap</td>
    </tr>
    <tr>
      <td>Confirmed consumers</td>
      <td><code class="language-plaintext highlighter-rouge">spawn_bitmap_entity_from_resource</code>, <code class="language-plaintext highlighter-rouge">reload_bitmap_entity_pixels_from_resource</code>, menu/help/dialogue overlay loaders</td>
    </tr>
    <tr>
      <td>Compression</td>
      <td>None confirmed</td>
    </tr>
    <tr>
      <td>Pixel format</td>
      <td>1 byte per pixel, palette-indexed</td>
    </tr>
  </tbody>
</table>

<h3 id="file-layout-2">File layout</h3>

<table>
  <thead>
    <tr>
      <th>Offset</th>
      <th>Size</th>
      <th>Type</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">width</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">height</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x08</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td>Unused/reserved in current analysis. Both native and ScummVM loaders ignore it; sampled <code class="language-plaintext highlighter-rouge">MOUSHELP.BM</code> stores zero here.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x0c</code></td>
      <td><code class="language-plaintext highlighter-rouge">width * height</code></td>
      <td>bytes</td>
      <td>Raw indexed pixel payload, row-major</td>
    </tr>
  </tbody>
</table>

<h3 id="notes-4">Notes</h3>

<ul>
  <li>The native binary reuses the same <code class="language-plaintext highlighter-rouge">RawBitmap</code>/<code class="language-plaintext highlighter-rouge">BitmapBuffer</code> shape in memory: <code class="language-plaintext highlighter-rouge">{ width, height, pixels }</code>.</li>
  <li>The format itself does not encode transparency, but many callers render it with palette index <code class="language-plaintext highlighter-rouge">0</code> treated as transparent.</li>
  <li>Compared with many later adventure engines, this is an aggressively direct format: no chunking, no palette sidecar inside the file, and no per-row metadata.</li>
</ul>

<h2 id="pal-standalone-palette"><code class="language-plaintext highlighter-rouge">PAL</code> (Standalone palette)</h2>

<p>Harvester’s <code class="language-plaintext highlighter-rouge">.PAL</code> files are raw palette payloads with no header. The native palette upload helpers and ScummVM <code class="language-plaintext highlighter-rouge">Art::loadPalette</code> both treat them as 256 RGB triplets.</p>

<h3 id="summary-6">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>Standalone 256-color palette resource</td>
    </tr>
    <tr>
      <td>Confirmed consumers</td>
      <td>Room setup, menus, help screens, wait overlay, town map</td>
    </tr>
    <tr>
      <td>Header</td>
      <td>None</td>
    </tr>
    <tr>
      <td>Payload size</td>
      <td>768 bytes (<code class="language-plaintext highlighter-rouge">256 * 3</code>)</td>
    </tr>
  </tbody>
</table>

<h3 id="file-layout-3">File layout</h3>

<table>
  <thead>
    <tr>
      <th>Offset / range</th>
      <th>Size</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x000..0x2ff</code></td>
      <td>768 bytes</td>
      <td>256 consecutive <code class="language-plaintext highlighter-rouge">(R, G, B)</code> triplets stored as 8-bit channel values</td>
    </tr>
  </tbody>
</table>

<h3 id="notes-5">Notes</h3>

<ul>
  <li>Native <code class="language-plaintext highlighter-rouge">upload_palette_to_vga</code> forces palette index <code class="language-plaintext highlighter-rouge">0</code> to black at upload time, applies brightness scaling, and then shifts the stored 8-bit channels down to VGA’s <code class="language-plaintext highlighter-rouge">0..63</code> DAC range.</li>
  <li>The sampled <code class="language-plaintext highlighter-rouge">INVHELP.PAL</code> contains full-range byte values up to <code class="language-plaintext highlighter-rouge">0xff</code>, which matches the native analysis that <code class="language-plaintext highlighter-rouge">.PAL</code> is stored as 8-bit RGB, not pre-divided 6-bit VGA values.</li>
  <li>This makes <code class="language-plaintext highlighter-rouge">.PAL</code> a nice contrast with FST’s embedded movie palettes, which are stored in 6-bit VGA form inside each frame payload.</li>
</ul>

<blockquote class="prompt-tip">
  <p>TRY IT OUT!</p>
  <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre>python3 <span class="nt">-m</span> pip <span class="nb">install </span>pillow
python3 <span class="nt">-c</span> <span class="s1">'from PIL import Image; import sys,struct; d=open(sys.argv[1],"rb").read(); w,h=struct.unpack_from("&lt;II",d,0); i=Image.frombytes("P",(w,h),d[12:12+w*h]); p=open(sys.argv[2],"rb").read()[:768] if len(sys.argv)&gt;2 else bytes(c for n in range(256) for c in (n,n,n)); i.putpalette(p); i.show()'</span> /path/to/<span class="k">*</span>.BM <span class="o">[</span>/path/to/<span class="k">*</span>.PAL]
</pre></td></tr></tbody></table></code></pre></div>  </div>

  <p><code class="language-plaintext highlighter-rouge">python3 -c '...' GRAPHIC/OTHER/INVHELP.BM</code> <br />
<img src="/images/ghidra3/bm-nopal.png" alt="" /> <br />
<code class="language-plaintext highlighter-rouge">python3 -c '...' GRAPHIC/OTHER/INVHELP.BM GRAPHIC/PAL/INVHELP.PAL</code> <br />
<img src="/images/ghidra3/bm-pal.png" alt="" /></p>
</blockquote>

<h2 id="cft-bitmap-font"><code class="language-plaintext highlighter-rouge">CFT</code> (Bitmap font)</h2>

<p><code class="language-plaintext highlighter-rouge">.CFT</code> packages a bitmap font as a fixed metrics header plus one raw 8-bit atlas image. The native <code class="language-plaintext highlighter-rouge">load_font_resource</code> and the ScummVM <code class="language-plaintext highlighter-rouge">Text::loadFont</code> / <code class="language-plaintext highlighter-rouge">HarvesterCftFont</code> code agree on the basic structure.</p>

<h3 id="summary-7">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>Bitmap font resource with metrics tables and atlas</td>
    </tr>
    <tr>
      <td>Glyph count</td>
      <td>256 slots</td>
    </tr>
    <tr>
      <td>Rendering model</td>
      <td>Each glyph is a horizontal slice out of one shared atlas bitmap</td>
    </tr>
    <tr>
      <td>Confirmed consumers</td>
      <td>Menu text, room labels, dialogue, save/load UI, text-entry widgets</td>
    </tr>
  </tbody>
</table>

<h3 id="file-layout-4">File layout</h3>

<table>
  <thead>
    <tr>
      <th>Offset</th>
      <th>Size</th>
      <th>Type</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x000</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x40</code></td>
      <td>char[64]</td>
      <td>NUL-terminated font name (<code class="language-plaintext highlighter-rouge">HARVFONT</code> in the sampled file)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x040</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td>Font height</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x042</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x200</code></td>
      <td>uint16le[256]</td>
      <td>Glyph start-X table</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x242</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x200</code></td>
      <td>uint16le[256]</td>
      <td>Glyph width table</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x442</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td>Space width</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x444</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>unknown</td>
      <td>Unused/reserved in current analysis</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x448</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td>Atlas width</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x44c</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td>Atlas height</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x450</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>unknown</td>
      <td>Unused/reserved in current analysis; sampled <code class="language-plaintext highlighter-rouge">HARVFONT.CFT</code> stores zero here</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x454</code></td>
      <td><code class="language-plaintext highlighter-rouge">atlasWidth * atlasHeight</code></td>
      <td>bytes</td>
      <td>Raw atlas pixels, row-major, 8-bit indexed</td>
    </tr>
  </tbody>
</table>

<h3 id="notes-6">Notes</h3>

<ul>
  <li>The renderer derives each glyph by slicing <code class="language-plaintext highlighter-rouge">width</code> pixels from <code class="language-plaintext highlighter-rouge">x</code> inside the shared atlas. There is no per-glyph bitmap chunking.</li>
  <li><code class="language-plaintext highlighter-rouge">drawHeight</code> is effectively <code class="language-plaintext highlighter-rouge">atlasHeight - 1</code> in the current font renderer, which matches the native behavior of treating the last row as non-drawing padding.</li>
  <li>The font renderer treats both <code class="language-plaintext highlighter-rouge">' '</code> and <code class="language-plaintext highlighter-rouge">'_'</code> as space-width characters. That ties neatly back to the script/text resources, where underscore-heavy identifiers and UI labels coexist with visible text.</li>
</ul>

<blockquote class="prompt-tip">
  <p>TRY IT OUT!</p>
  <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>python3 <span class="nt">-c</span> <span class="s1">'from PIL import Image; import sys,struct; d=open(sys.argv[1],"rb").read(); aw,ah=struct.unpack_from("&lt;II",d,0x448); h=max(1,min(struct.unpack_from("&lt;H",d,0x40)[0] or ah-1,ah-1)); sw=struct.unpack_from("&lt;H",d,0x442)[0] or 1; s=[struct.unpack_from("&lt;H",d,0x42+i*2)[0] for i in range(256)]; w=[struct.unpack_from("&lt;H",d,0x242+i*2)[0] for i in range(256)]; a=Image.frombytes("L",(aw,ah),d[0x454:0x454+aw*ah]); t=sys.argv[2]; W=sum(sw if ord(c)&gt;255 or c in " _" or w[ord(c)]&lt;=0 else w[ord(c)] for c in t); i=Image.new("L",(max(1,W),h),0); x=0; exec("for c in t:\n o=ord(c)\n gw=sw if o&gt;255 or c in \" _\" or w[o]&lt;=0 or s[o]&gt;=aw else w[o]\n if gw and o&lt;=255 and c not in \" _\" and w[o]&gt;0 and s[o]&lt;aw: i.paste(255,(x,0,x+gw,h),a.crop((s[o],0,s[o]+gw,h)))\n x+=gw"); i.resize((max(1,i.width*4),max(1,i.height*4)),Image.NEAREST).show()'</span> <span class="s2">"GRAPHIC/FONT/HARVFNT2.CFT"</span> <span class="s2">"Hello World"</span>
</pre></td></tr></tbody></table></code></pre></div>  </div>
  <p><img src="/images/ghidra3/hello-cft.png" alt="" /></p>
</blockquote>

<h2 id="abm-animated-bitmap--sprite-strip"><code class="language-plaintext highlighter-rouge">ABM</code> (Animated bitmap / sprite strip)</h2>

<p><code class="language-plaintext highlighter-rouge">.ABM</code> is Harvester’s main custom sprite/animation format. It backs cursor art, actor sprites, room animations, combat entities, and wait-overlay animation frames.</p>

<h3 id="summary-8">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>Multi-frame indexed animation resource</td>
    </tr>
    <tr>
      <td>Confirmed consumers</td>
      <td><code class="language-plaintext highlighter-rouge">attach_abm_resource_to_entity</code>, <code class="language-plaintext highlighter-rouge">spawn_abm_entity_from_resource</code>, startup art loaders, room animation runtime</td>
    </tr>
    <tr>
      <td>Pixel format</td>
      <td>8-bit indexed pixels</td>
    </tr>
    <tr>
      <td>Compression</td>
      <td>Optional per-frame RLE-like stream</td>
    </tr>
  </tbody>
</table>

<h3 id="file-header">File header</h3>

<table>
  <thead>
    <tr>
      <th>Offset</th>
      <th>Size</th>
      <th>Type</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">frame_count</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td>Native runtime uses this value to size the temporary decoded-frame buffer before adding <code class="language-plaintext highlighter-rouge">0x10</code> bytes of slack. The current ScummVM loader does not otherwise interpret it. Sampled <code class="language-plaintext highlighter-rouge">BLOOD.ABM</code> stores <code class="language-plaintext highlighter-rouge">0x888</code> here.</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x08</code></td>
      <td>variable</td>
      <td>sequence</td>
      <td>First frame record begins here</td>
    </tr>
  </tbody>
</table>

<h3 id="per-frame-record">Per-frame record</h3>

<table>
  <thead>
    <tr>
      <th>Offset within frame</th>
      <th>Size</th>
      <th>Type</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>int32le</td>
      <td><code class="language-plaintext highlighter-rouge">x_offset</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>int32le</td>
      <td><code class="language-plaintext highlighter-rouge">y_offset</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x08</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">width</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x0c</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">height</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x10</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x01</code></td>
      <td>byte</td>
      <td><code class="language-plaintext highlighter-rouge">compressed_flag</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x11</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">encoded_size</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x15</code></td>
      <td><code class="language-plaintext highlighter-rouge">encoded_size</code></td>
      <td>bytes</td>
      <td>Encoded or raw pixel payload for this frame</td>
    </tr>
  </tbody>
</table>

<h3 id="compressed-frame-stream">Compressed frame stream</h3>

<table>
  <thead>
    <tr>
      <th>Control byte form</th>
      <th>Meaning</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00..0x7f</code></td>
      <td>Copy the next <code class="language-plaintext highlighter-rouge">control</code> bytes literally</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x80..0xff</code></td>
      <td>Read one byte and repeat it <code class="language-plaintext highlighter-rouge">control &amp; 0x7f</code> times</td>
    </tr>
  </tbody>
</table>

<h3 id="notes-7">Notes</h3>

<ul>
  <li>Each frame decodes to exactly <code class="language-plaintext highlighter-rouge">width * height</code> bytes of indexed pixels.</li>
  <li>The sampled <code class="language-plaintext highlighter-rouge">BLOOD.ABM</code> starts with 5 frames; its first frame is offset <code class="language-plaintext highlighter-rouge">(4, 0)</code>, size <code class="language-plaintext highlighter-rouge">38 x 50</code>, and marked compressed.</li>
</ul>

<blockquote class="prompt-tip">
  <p>Try it out!</p>
  <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>python3 <span class="nt">-c</span> <span class="s1">$'from PIL import Image</span><span class="se">\n</span><span class="s1">import sys,struct,tempfile,pathlib,webbrowser</span><span class="se">\n\n</span><span class="s1">def dec(s,n):</span><span class="se">\n</span><span class="s1"> i=0; o=bytearray()</span><span class="se">\n</span><span class="s1"> while i&lt;len(s) and len(o)&lt;n:</span><span class="se">\n</span><span class="s1">  k=s[i]; i+=1</span><span class="se">\n</span><span class="s1">  if k&lt;128:</span><span class="se">\n</span><span class="s1">   c=min(k,len(s)-i,n-len(o)); o+=s[i:i+c]; i+=c</span><span class="se">\n</span><span class="s1">  elif i&lt;len(s):</span><span class="se">\n</span><span class="s1">   o.extend([s[i]]*min(k&amp;127,n-len(o))); i+=1</span><span class="se">\n</span><span class="s1"> return bytes(o)</span><span class="se">\n\n</span><span class="s1">d=open(sys.argv[1],"rb").read()</span><span class="se">\n</span><span class="s1">pal=open(sys.argv[2],"rb").read()[:768] if len(sys.argv)&gt;2 else bytes(c for n in range(256) for c in (n,n,n))</span><span class="se">\n</span><span class="s1">fc=struct.unpack_from("&lt;I",d,0)[0]; off=8; F=[]; minx=miny=10**9; maxx=maxy=-10**9</span><span class="se">\n</span><span class="s1">for _ in range(fc):</span><span class="se">\n</span><span class="s1"> x,y,w,h=struct.unpack_from("&lt;iiii",d,off); c=d[off+16]; n=struct.unpack_from("&lt;I",d,off+17)[0]; s=d[off+25:off+25+n]; off+=25+n; p=s[:w*h] if not c else dec(s,w*h); F.append((x,y,w,h,p)); minx=min(minx,x); miny=min(miny,y); maxx=max(maxx,x+w); maxy=max(maxy,y+h)</span><span class="se">\n</span><span class="s1">minx=min(0,minx); miny=min(0,miny); W=maxx-minx; H=maxy-miny; G=[]</span><span class="se">\n</span><span class="s1">for x,y,w,h,p in F:</span><span class="se">\n</span><span class="s1"> src=Image.frombytes("P",(w,h),p); src.putpalette(pal); m=src.point(lambda v:0 if v==0 else 255,"L"); fr=Image.new("RGBA",(W,H),(0,0,0,0)); fr.paste(src.convert("RGBA"),(x-minx,y-miny),m); G.append(fr)</span><span class="se">\n</span><span class="s1">f=tempfile.NamedTemporaryFile(suffix=".gif",delete=False).name; G[0].save(f,save_all=True,append_images=G[1:],duration=100,loop=0,disposal=2,transparency=0); webbrowser.open(pathlib.Path(f).as_uri())'</span> /path/to/.ABM <span class="o">[</span>/path/to/.PAL]
</pre></td></tr></tbody></table></code></pre></div>  </div>
  <p>For example:  <br />
<code class="language-plaintext highlighter-rouge">python3 -c $'...' "GRAPHIC/ROOMANIM/CLOAK.ABM"</code> <br />
<img src="/images/ghidra3/cloak-nopal.gif" alt="" /> <br />
<code class="language-plaintext highlighter-rouge">python3 -c $'...' "GRAPHIC/ROOMANIM/CLOAK.ABM" "GRAPHIC/PAL/INVHELP.PAL"</code> <br />
<img src="/images/ghidra3/cloak-pal.gif" alt="" /></p>
</blockquote>

<h2 id="cmp-futurevision-compressed-audio"><code class="language-plaintext highlighter-rouge">CMP</code> (FutureVision compressed audio)</h2>

<p>See <a href="https://wiki.multimedia.cx/index.php/FutureVision_audio_formats">FutureVision audio formats</a> as well.</p>

<p><code class="language-plaintext highlighter-rouge">.CMP</code> is Harvester’s custom compressed audio wrapper. The sampled files begin with <code class="language-plaintext highlighter-rouge">FCMP</code>, and the native <code class="language-plaintext highlighter-rouge">load_sound_sample</code> / <code class="language-plaintext highlighter-rouge">load_dialogue_voice_sample</code> code plus the ScummVM <code class="language-plaintext highlighter-rouge">decodeHarvesterFcmp</code> path all treat that payload as IMA-ADPCM-like compressed audio.</p>

<h3 id="summary-9">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>Compressed audio for music, dialogue voice, and sound effects</td>
    </tr>
    <tr>
      <td>Confirmed magic</td>
      <td><code class="language-plaintext highlighter-rouge">FCMP</code></td>
    </tr>
    <tr>
      <td>Confirmed codec</td>
      <td>IMA-ADPCM-style nibble stream using the standard step/index tables recovered in the binary</td>
    </tr>
    <tr>
      <td>Supported output depths</td>
      <td>8-bit and 16-bit PCM</td>
    </tr>
  </tbody>
</table>

<h3 id="file-layout-5">File layout</h3>

<table>
  <thead>
    <tr>
      <th>Offset</th>
      <th>Size</th>
      <th>Type</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>ASCII</td>
      <td><code class="language-plaintext highlighter-rouge">FCMP</code> magic</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td>Compressed payload size</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x08</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td>Sample rate in Hz</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x0c</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td>Output bits per sample (<code class="language-plaintext highlighter-rouge">8</code> or <code class="language-plaintext highlighter-rouge">16</code>)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x0e</code></td>
      <td>variable</td>
      <td>bytes</td>
      <td>ADPCM payload</td>
    </tr>
  </tbody>
</table>

<h3 id="notes-8">Notes</h3>

<ul>
  <li>Sampled <code class="language-plaintext highlighter-rouge">MENACE.CMP</code> starts with <code class="language-plaintext highlighter-rouge">FCMP</code>, a payload size of <code class="language-plaintext highlighter-rouge">0x0db48b</code>, sample rate <code class="language-plaintext highlighter-rouge">22050</code>, and <code class="language-plaintext highlighter-rouge">16</code> bits per sample.</li>
  <li>The loader family is tolerant: some call paths accept either <code class="language-plaintext highlighter-rouge">FCMP</code> or raw <code class="language-plaintext highlighter-rouge">WAVE</code> data. The <code class="language-plaintext highlighter-rouge">.CMP</code> extension itself points to the compressed path, but the runtime checks the actual file signature before decoding.</li>
  <li>From an architectural perspective, this is the audio-side equivalent of Harvester’s XFILE abstraction: one wrapper format reused across music, speech, and effects.</li>
</ul>

<h2 id="fst-cutscene--streamed-animation"><code class="language-plaintext highlighter-rouge">FST</code> (Cutscene / streamed animation)</h2>

<p><code class="language-plaintext highlighter-rouge">.FST</code> is Harvester’s custom streamed movie format. It combines a file header, a compact per-frame index table, block-coded video payloads, optional per-frame palettes, and per-frame audio chunks.</p>

<h3 id="summary-10">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>Cutscene / transition movie format</td>
    </tr>
    <tr>
      <td>Confirmed magic</td>
      <td><code class="language-plaintext highlighter-rouge">FST2</code> (<code class="language-plaintext highlighter-rouge">0x32545346</code> on disk, little-endian)</td>
    </tr>
    <tr>
      <td>Video model</td>
      <td>8-bit indexed frames decoded in <code class="language-plaintext highlighter-rouge">4 x 4</code> blocks</td>
    </tr>
    <tr>
      <td>Audio model</td>
      <td>Per-frame audio chunks described by the frame index table</td>
    </tr>
    <tr>
      <td>Confirmed consumers</td>
      <td><code class="language-plaintext highlighter-rouge">run_fst_sequence_player</code>, <code class="language-plaintext highlighter-rouge">play_fst_sequence</code>, startup intro path, scripted room transitions</td>
    </tr>
  </tbody>
</table>

<h3 id="file-header-1">File header</h3>

<table>
  <thead>
    <tr>
      <th>Offset</th>
      <th>Size</th>
      <th>Type</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>ASCII</td>
      <td><code class="language-plaintext highlighter-rouge">FST2</code> magic</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td>Frame width</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x08</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td>Frame height</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x0c</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">max_frame_size</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x10</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">frame_count</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x14</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">frame_rate</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x18</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">sample_rate</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x1c</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">bits_per_sample</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x20</code></td>
      <td><code class="language-plaintext highlighter-rouge">frame_count * 6</code></td>
      <td>array</td>
      <td><code class="language-plaintext highlighter-rouge">FstFrameIndexEntry[frame_count]</code></td>
    </tr>
  </tbody>
</table>

<h3 id="frame-index-entry">Frame index entry</h3>

<table>
  <thead>
    <tr>
      <th>Offset within entry</th>
      <th>Size</th>
      <th>Type</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td><code class="language-plaintext highlighter-rouge">video_size</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td><code class="language-plaintext highlighter-rouge">audio_size</code></td>
    </tr>
  </tbody>
</table>

<h3 id="video-payload-for-one-frame">Video payload for one frame</h3>

<table>
  <thead>
    <tr>
      <th>Sequence element</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">uint16 bit_count</code></td>
      <td>Number of bits in the control bitstream</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">((bit_count &gt;&gt; 3) + 1)</code> bytes</td>
      <td>Packed control-bit stream</td>
    </tr>
    <tr>
      <td>Optional <code class="language-plaintext highlighter-rouge">256 * 3</code> bytes</td>
      <td>Present only when the first control bit is set; this is a VGA-style 6-bit palette block, not a <code class="language-plaintext highlighter-rouge">.PAL</code> resource block</td>
    </tr>
    <tr>
      <td>Block payload stream</td>
      <td>One record per changed <code class="language-plaintext highlighter-rouge">4 x 4</code> tile across the frame</td>
    </tr>
  </tbody>
</table>

<h3 id="block-coding">Block coding</h3>

<table>
  <thead>
    <tr>
      <th>Control bits / payload</th>
      <th>Meaning</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Changed-bit <code class="language-plaintext highlighter-rouge">0</code></td>
      <td>Leave the existing <code class="language-plaintext highlighter-rouge">4 x 4</code> block unchanged</td>
    </tr>
    <tr>
      <td>Changed-bit <code class="language-plaintext highlighter-rouge">1</code>, mode-bit <code class="language-plaintext highlighter-rouge">0</code>, 16-byte payload</td>
      <td>Literal <code class="language-plaintext highlighter-rouge">4 x 4</code> pixel block</td>
    </tr>
    <tr>
      <td>Changed-bit <code class="language-plaintext highlighter-rouge">1</code>, mode-bit <code class="language-plaintext highlighter-rouge">1</code>, 4-byte payload</td>
      <td>Two colors plus a 16-bit mask describing a <code class="language-plaintext highlighter-rouge">4 x 4</code> block</td>
    </tr>
  </tbody>
</table>

<h3 id="notes-9">Notes</h3>

<ul>
  <li>Sampled <code class="language-plaintext highlighter-rouge">VIRGLOGO.FST</code> is <code class="language-plaintext highlighter-rouge">320 x 200</code>, stores <code class="language-plaintext highlighter-rouge">131</code> frames, plays at <code class="language-plaintext highlighter-rouge">15</code> fps, and carries <code class="language-plaintext highlighter-rouge">22050 Hz</code> / <code class="language-plaintext highlighter-rouge">16-bit</code> audio.</li>
  <li>The native and ScummVM players both treat FST as a streaming format: read one frame, queue its audio, decode its video, and advance without loading the whole movie at once.</li>
  <li><code class="language-plaintext highlighter-rouge">HARVEST.SCR</code> uses <code class="language-plaintext highlighter-rouge">GOFLIC</code>/<code class="language-plaintext highlighter-rouge">GODEATHFLIC</code> opcodes to trigger <code class="language-plaintext highlighter-rouge">.FST</code> files. That mismatch between opcode names and file extension is a strong architectural clue that FST replaced or wrapped an older movie concept without rewriting the script vocabulary.</li>
  <li>The censorship path also shows how self-contained the format is: FST can carry its own palette updates frame-by-frame, while the player temporarily swaps in an external <code class="language-plaintext highlighter-rouge">CENSORED.PCX</code> overlay when gore is disabled.</li>
</ul>

<h2 id="pcx-standard-indexed-still-image"><code class="language-plaintext highlighter-rouge">PCX</code> (Standard indexed still image)</h2>

<p>Harvester does use standard <code class="language-plaintext highlighter-rouge">.PCX</code> files in at least one confirmed place: the censorship overlay shown during certain FST sequences when gore is disabled.</p>

<h3 id="summary-11">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>Standard 8-bit indexed PCX still image</td>
    </tr>
    <tr>
      <td>Confirmed consumer</td>
      <td>Native <code class="language-plaintext highlighter-rouge">load_pcx_bitmap</code>; ScummVM FST censorship overlay loader</td>
    </tr>
    <tr>
      <td>Encoding</td>
      <td>PCX RLE</td>
    </tr>
    <tr>
      <td>Palette</td>
      <td>Trailing 256-color palette block when present</td>
    </tr>
  </tbody>
</table>

<h3 id="confirmed-header-fields">Confirmed header fields</h3>

<table>
  <thead>
    <tr>
      <th>Offset</th>
      <th>Size</th>
      <th>Type</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x01</code></td>
      <td>byte</td>
      <td>Manufacturer (<code class="language-plaintext highlighter-rouge">0x0a</code> in sampled <code class="language-plaintext highlighter-rouge">CENSORED.PCX</code>)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x01</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x01</code></td>
      <td>byte</td>
      <td>Version (<code class="language-plaintext highlighter-rouge">0x05</code> in the sample)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x01</code></td>
      <td>byte</td>
      <td>Encoding (<code class="language-plaintext highlighter-rouge">0x01</code> = RLE)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x03</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x01</code></td>
      <td>byte</td>
      <td>Bits per pixel per plane (<code class="language-plaintext highlighter-rouge">0x08</code> in the sample)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td><code class="language-plaintext highlighter-rouge">xMin</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x06</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td><code class="language-plaintext highlighter-rouge">yMin</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x08</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td><code class="language-plaintext highlighter-rouge">xMax</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x0a</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td><code class="language-plaintext highlighter-rouge">yMax</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x41</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x01</code></td>
      <td>byte</td>
      <td>Color planes (<code class="language-plaintext highlighter-rouge">0x01</code> in the sample)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x42</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td>Bytes per line (<code class="language-plaintext highlighter-rouge">320</code> in the sample)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x80</code></td>
      <td>variable</td>
      <td>bytes</td>
      <td>RLE-compressed image data</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">EOF - 769</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x01</code></td>
      <td>byte</td>
      <td>Standard palette marker <code class="language-plaintext highlighter-rouge">0x0c</code> in the sampled file</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">EOF - 768</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x300</code></td>
      <td>bytes</td>
      <td>256-color palette block when present</td>
    </tr>
  </tbody>
</table>

<h3 id="notes-10">Notes</h3>

<ul>
  <li>Sampled <code class="language-plaintext highlighter-rouge">CENSORED.PCX</code> is a textbook single-plane <code class="language-plaintext highlighter-rouge">320 x 200</code> PCX, which is why the ScummVM port can use the generic PCX decoder for it.</li>
  <li>The native loader contains two Harvester-specific wrinkles: it trims the optional <code class="language-plaintext highlighter-rouge">0x0c</code> palette marker by jumping straight to the last <code class="language-plaintext highlighter-rouge">0x300</code> bytes, and when the logical width from <code class="language-plaintext highlighter-rouge">xMax</code> is one pixel smaller than <code class="language-plaintext highlighter-rouge">bytesPerLine</code>, it trims the padded stride byte after decode.</li>
</ul>

<h2 id="flc-standard-autodesk-flic-animation"><code class="language-plaintext highlighter-rouge">FLC</code> (Standard Autodesk FLIC animation)</h2>

<p>Harvester does ship <code class="language-plaintext highlighter-rouge">.FLC</code> files, and the sampled files match the standard Autodesk FLIC/FLC header rather than a Harvester-specific wrapper. The current Harvester engine work has not yet recovered the exact native Harvester-side call path that consumes them, so this section stays limited to what is directly supported by sample bytes, binary strings, and the generic ScummVM FLIC decoder.</p>

<h3 id="summary-12">Summary</h3>

<table>
  <thead>
    <tr>
      <th>Property</th>
      <th>Value</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Format role</td>
      <td>Standard 8-bit FLIC/FLC animation files bundled with Harvester</td>
    </tr>
    <tr>
      <td>Evidence</td>
      <td>Sample files such as <code class="language-plaintext highlighter-rouge">GRAPHIC/ROOMANIM/HARVPNTR.FLC</code> and <code class="language-plaintext highlighter-rouge">GRAPHIC/FST/CHESMOV1.FLC</code>; native strings <code class="language-plaintext highlighter-rouge">Could not load flic.</code> and <code class="language-plaintext highlighter-rouge">flic.cpp</code>; hardcoded <code class="language-plaintext highlighter-rouge">.flc</code> paths in the binary</td>
    </tr>
    <tr>
      <td>Harvester-specific loader status</td>
      <td>Not yet fully recovered in current Harvester analysis</td>
    </tr>
    <tr>
      <td>Decoder in repo</td>
      <td>Generic ScummVM <code class="language-plaintext highlighter-rouge">video/flic_decoder.cpp</code></td>
    </tr>
  </tbody>
</table>

<h3 id="confirmed-standard-header-fields">Confirmed standard header fields</h3>

<table>
  <thead>
    <tr>
      <th>Offset</th>
      <th>Size</th>
      <th>Type</th>
      <th>Description</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x00</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td>File size</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td>Magic <code class="language-plaintext highlighter-rouge">0xaf12</code></td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x06</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td>Frame count</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x08</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td>Width</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x0a</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td>Height</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x0c</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td>Color depth (<code class="language-plaintext highlighter-rouge">8</code> in the sampled file)</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x0e</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x02</code></td>
      <td>uint16le</td>
      <td>Flags</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x10</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td>Frame delay in milliseconds</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x50</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td>Offset of frame 1</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x54</code></td>
      <td><code class="language-plaintext highlighter-rouge">0x04</code></td>
      <td>uint32le</td>
      <td>Offset of frame 2</td>
    </tr>
    <tr>
      <td><code class="language-plaintext highlighter-rouge">0x80</code></td>
      <td>variable</td>
      <td>bytes</td>
      <td>First frame/chunk stream begins here in the generic decoder</td>
    </tr>
  </tbody>
</table>

<h3 id="notes-11">Notes</h3>

<ul>
  <li>Sampled <code class="language-plaintext highlighter-rouge">HARVPNTR.FLC</code> starts with a valid FLC header: size <code class="language-plaintext highlighter-rouge">0x214a</code>, magic <code class="language-plaintext highlighter-rouge">0xaf12</code>, <code class="language-plaintext highlighter-rouge">10</code> frames, <code class="language-plaintext highlighter-rouge">26 x 26</code>, and <code class="language-plaintext highlighter-rouge">8</code>-bit color.</li>
  <li>The decoded <code class="language-plaintext highlighter-rouge">HARVEST.SCR</code> sample does not reference <code class="language-plaintext highlighter-rouge">.FLC</code> files directly; its <code class="language-plaintext highlighter-rouge">GOFLIC</code> opcodes currently point to <code class="language-plaintext highlighter-rouge">.FST</code> files instead. That suggests <code class="language-plaintext highlighter-rouge">.FLC</code> belongs to a parallel or older asset path rather than the main shipping story-transition pipeline. This last point is an inference from the sampled script and should stay provisional.</li>
  <li>Taken together, the resource set shows three animation strata: lightweight sprite ABMs, streamed FST movies, and a residual standard FLC layer. That is a strong narrative hook for explaining how mixed-tool resource pipelines often survive into shipped games.</li>
</ul>

<blockquote class="prompt-tip">
  <p>TRY IT OUT</p>
  <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
</pre></td><td class="rouge-code"><pre>ffplay <span class="nt">-loop</span> <span class="nt">-1</span> <span class="s2">"GRAPHIC/ROOMANIM/HARVPNTR.FLC"</span>
ffmpeg <span class="nt">-i</span> <span class="s2">"GRAPHIC/ROOMANIM/HARVPNTR.FLC"</span> out.gif
</pre></td></tr></tbody></table></code></pre></div>  </div>
  <p><img src="/images/ghidra3/flc.gif" alt="" /></p>
</blockquote>]]></content><author><name></name></author><category term="Programming" /><category term="programming" /><category term="reverse-engineering" /><category term="scummvm" /><category term="ghidra" /><summary type="html"><![CDATA[Series:&nbsp;Reverse Engineering HarvesterThis review is part of the Reverse Engineering Harvester series, where I document my journey of reverse engineering the 1996 DOS game Harvester to re-implement its game engine in ScummVM.← Reverse Engineering Harvester with Ghidra and Codex - Part 2Reverse Engineering Harvester with Ghidra and Codex - Part 3: File Formats→ Reverse Engineering Harvester with Ghidra and Codex - Part 4: Command Opcodes Article 3 of 6 in this series.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.alexbevi.com/images/ghidra1/harvester_reverse_engineering_banner_1200x600.png" /><media:content medium="image" url="https://www.alexbevi.com/images/ghidra1/harvester_reverse_engineering_banner_1200x600.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Reverse Engineering Harvester with Ghidra and Codex - Part 2</title><link href="https://www.alexbevi.com/blog/2026/03/17/reverse-engineering-harvester-with-ghidra-and-codex-part-2/" rel="alternate" type="text/html" title="Reverse Engineering Harvester with Ghidra and Codex - Part 2" /><published>2026-03-17T19:22:34-04:00</published><updated>2026-04-29T09:45:29-04:00</updated><id>https://www.alexbevi.com/blog/2026/03/17/reverse-engineering-harvester-with-ghidra-and-codex-part-2</id><content type="html" xml:base="https://www.alexbevi.com/blog/2026/03/17/reverse-engineering-harvester-with-ghidra-and-codex-part-2/"><![CDATA[<blockquote class="prompt-tip mb-6">
  <strong>Series:&nbsp;<a href="/blog/2026/03/14/reverse-engineering-a-dos-game-with-ghidra-and-codex/">Reverse Engineering Harvester</a></strong><p>This review is part of the Reverse Engineering Harvester series, where I document my journey of reverse engineering the 1996 DOS game Harvester to re-implement its game engine in ScummVM.</p><ul class="list-none space-y-1"><li>← <a href="/blog/2026/03/14/reverse-engineering-a-dos-game-with-ghidra-and-codex/">Reverse Engineering a DOS Game with Ghidra and Codex</a></li><li><strong>Reverse Engineering Harvester with Ghidra and Codex - Part 2</strong></li><li>→ <a href="/blog/2026/03/23/reverse-engineering-harvester-with-ghidra-and-codex-part-3-file-formats/">Reverse Engineering Harvester with Ghidra and Codex - Part 3: File Formats</a></li></ul>

  <p>Article 2 of 6 in this series.</p>
</blockquote>

<p>According to the <a href="https://harvester.telepedia.net/wiki/Manual">Harvester manual</a>, your health is represented by a picture of Steve in the inventory that gets bloodier the more damage you take. Since we’re using a Ghidra disassembly and Codex to build out our ScummVM engine, let’s have a look at how easy adding this functionality can be.</p>

<p><img src="/images/ghidra2/SCR-20260318-gkwu.png" alt="" /></p>

<p>First let’s confirm that our implementation doesn’t support this functionality just yet. This is easy enough to do as we’ve already started working on the inventory system, so clicking on Steve should bring up the expected view - missing photo and all.</p>

<p><img src="/images/ghidra2/SCR-20260317-quvr.png" alt="" /></p>

<p>Comparing this with the original via DOSBox (left) and our current engine (right), we not only can clearly see what the expected outcome should be. The prompt we’re going to use is as follows:</p>

<blockquote>
  <p>when you click on the inventory view, Steve has his health tracked in the photo in the bottom left. Check back in Ghidra to trace health management, and if there are any unnamed functions or data structures that apply, rename them once you have high confidence. Implement these changes engine-side and commit changes when ready.</p>
</blockquote>

<p><img src="/images/ghidra2/SCR-20260317-qwys.png" alt="" /></p>

<p>Just like magic, after about 13 minutes of processing Codex will revisit our existing disassembly in Ghidra, clarify any gaps there first, carry that logic back to Visual Studio Code and propose engine changes to implement the requested functionality.</p>

<p><img src="/images/ghidra2/SCR-20260317-qvjx.png" alt="" /></p>

<p>The first pass is honestly pretty impressive, as other than a visual anomaly in the form or a black box seemingly being rendered out of place, the health indicator is working already!</p>

<p>Since we’re taking side-by-side screenshots of a reference implementation (the original running in DOSBox) to compare our re-implementation with, we can actually attach that screenshot as context to steer our next prompt:</p>

<blockquote>
  <p>you can see from the attached screenshot of dosbox (left) and our engine (right) that the health indicator photo is correctly placed, but there’s a black box under it that appears to be either an incorrect transparency or a misrotated container element. Revist Ghidra for inventory and health drawing and determine how to fix this</p>
</blockquote>

<p><img src="/images/ghidra2/SCR-20260317-qxoa.png" alt="" /></p>

<p>Codex is making a number of tool calls, so I’ve got <a href="https://www.rtk-ai.app/"><code class="language-plaintext highlighter-rouge">rtk</code></a> installed to compress some of the output to extend my context window slightly. It only takes a couple minutes, but the model is able to determine the cause of the issue based on the state of our reversed engineered executable in Ghidra.</p>

<p><img src="/images/ghidra2/SCR-20260317-qxzk.png" alt="" /></p>

<p>Having invested the time to map out the data structures and function calls in Ghidra really makes it a lot more efficient to work with it to reimplement game logic, and the Ghidra MCP bridge allows us to keep doing this quickly and effectively in Visual Studio Code.</p>

<p><img src="/images/ghidra2/SCR-20260317-qyip.png" alt="" /></p>

<p>What this practically translates to is a fairly good implementation (see full <a href="https://github.com/alexbevi/scummvm/commit/df15ab715792ba0a26f12430ac489619f9be2857">commit</a>).</p>

<p>Ten years ago while I was still muddling my way through <a href="/blog/2015/04/08/asylum-engine-update/"><code class="language-plaintext highlighter-rouge">asylum</code> engine updates</a> in IDA Pro, making this type of progress would have taken me weeks. Now, in about 30 minutes I can achieve exponentially more!</p>]]></content><author><name></name></author><category term="Programming" /><category term="programming" /><category term="reverse-engineering" /><category term="scummvm" /><category term="ghidra" /><summary type="html"><![CDATA[Series:&nbsp;Reverse Engineering HarvesterThis review is part of the Reverse Engineering Harvester series, where I document my journey of reverse engineering the 1996 DOS game Harvester to re-implement its game engine in ScummVM.← Reverse Engineering a DOS Game with Ghidra and CodexReverse Engineering Harvester with Ghidra and Codex - Part 2→ Reverse Engineering Harvester with Ghidra and Codex - Part 3: File Formats Article 2 of 6 in this series.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.alexbevi.com/images/ghidra1/harvester_reverse_engineering_banner_1200x600.png" /><media:content medium="image" url="https://www.alexbevi.com/images/ghidra1/harvester_reverse_engineering_banner_1200x600.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Reverse Engineering a DOS Game with Ghidra and Codex</title><link href="https://www.alexbevi.com/blog/2026/03/14/reverse-engineering-a-dos-game-with-ghidra-and-codex/" rel="alternate" type="text/html" title="Reverse Engineering a DOS Game with Ghidra and Codex" /><published>2026-03-14T06:13:05-04:00</published><updated>2026-04-29T09:45:29-04:00</updated><id>https://www.alexbevi.com/blog/2026/03/14/reverse-engineering-a-dos-game-with-ghidra-and-codex</id><content type="html" xml:base="https://www.alexbevi.com/blog/2026/03/14/reverse-engineering-a-dos-game-with-ghidra-and-codex/"><![CDATA[<blockquote class="prompt-tip mb-6">
  <strong>Series:&nbsp;<a href="/blog/2026/03/14/reverse-engineering-a-dos-game-with-ghidra-and-codex/">Reverse Engineering Harvester</a></strong><p>This review is part of the Reverse Engineering Harvester series, where I document my journey of reverse engineering the 1996 DOS game Harvester to re-implement its game engine in ScummVM.</p><ul class="list-none space-y-1"><li><strong>Reverse Engineering a DOS Game with Ghidra and Codex</strong></li><li>→ <a href="/blog/2026/03/17/reverse-engineering-harvester-with-ghidra-and-codex-part-2/">Reverse Engineering Harvester with Ghidra and Codex - Part 2</a></li></ul>

  <p>Article 1 of 6 in this series.</p>
</blockquote>

<p>As part of my series called <a href="/blog/2021/07/28/adventure-games-1980-1999/">“Let’s Adventure!”</a> I had a chance to revisit a lot of classic adventure games I loved growing up, but also experience some I’d missed or never heard of. One of those games was <a href="/blog/2024/02/23/harvester/">Harvester</a>, which was just the right mix of weird characters and settings with a bizarre plot tieing everything together.</p>

<p><img src="/images/adventure/harvester/harvest_012.png" alt="" /></p>

<p>Though I wouldn’t say the game was great, it was interesting enough to keep me engaged and even after playing through it still remains top of mind.</p>

<p>With the advent of agentic development tools like <a href="https://claude.com/product/claude-code">Claude Code</a> or <a href="https://openai.com/codex/">Codex</a>, I figured I’d take a crack at reverse engineering Harvester with the goal of eventually reimplementing it as a game engine in <a href="https://www.scummvm.org/">ScummVM</a>.</p>

<h2 id="tools">Tools</h2>

<ul>
  <li><a href="https://code.visualstudio.com/">Visual Studio Code</a></li>
  <li>Codex (used via Visual Studio Code)</li>
  <li><a href="https://github.com/nationalsecurityagency/ghidra">Ghidra</a></li>
  <li><a href="https://github.com/bethington/ghidra-mcp">Ghidra MCP</a></li>
  <li><a href="https://github.com/yetmorecode/ghidra-lx-loader">Ghidra Loader for the LX/LE executable file format</a></li>
</ul>

<h2 id="getting-started">Getting Started</h2>

<p>The first step is almost always to just load the EXE into Ghidra and kicking off an initial analysis. Once the analysis is complete, I like to get a sense of what types of string constants are exposed that I may be able to anchor some logic to by inspecting. the <a href="https://www.ghidradocs.com/12.0.4_PUBLIC/help/Base/help/topics/ViewStringsPlugin/ViewStringsPlugin.htm">Defined Strings</a>.</p>

<p>What I’m looking for on a first pass is:</p>

<ul>
  <li>resource filenames as constants (ex: <code class="language-plaintext highlighter-rouge">HARVEST.DAT</code>) as these can help point to where file I/O routines are</li>
  <li>error messages</li>
  <li>inventory item labels</li>
  <li>dialogue text</li>
  <li>names that might identify internal data structures</li>
  <li>character names</li>
</ul>

<p>Since I didn’t see any of these, I tried to decompress the executable in DOSBox using <a href="https://bencastricum.nl/unp/"><code class="language-plaintext highlighter-rouge">UNP.EXE</code></a>. This didn’t do anything as compression wasn’t what was obfuscating the strings, but what I could see looked like memory management details - which lead me to believe the game was using a <a href="https://en.wikipedia.org/wiki/DOS_extender">DOS extender</a> like <code class="language-plaintext highlighter-rouge">DOS/4GW</code>.</p>

<p><img src="/images/ghidra1/ghidra-exe-extended.png" alt="" /></p>

<p><code class="language-plaintext highlighter-rouge">DOS/4GW</code> was the Watcom-packaged subset of Tenberry’s <code class="language-plaintext highlighter-rouge">DOS/4G</code>: a <a href="https://openwatcom.org/ftp/manuals/current/pguide.pdf">32-bit DOS extender customized for the Watcom toolchain</a>. Its job was to let a DOS program run in 386 protected mode with a flat, zero-based memory model, so developers could stop fighting 16-bit segmented memory and work with much larger address spaces. Just as importantly for games, <code class="language-plaintext highlighter-rouge">DOS/4GW</code> still mapped the first megabyte of physical memory into a shared linear address space, which meant code could still reach things like video RAM and BIOS data when needed.</p>

<h2 id="unbinding-the-extended-exe">Unbinding the Extended EXE</h2>

<p>To be able to make any meaningful progress, we need Ghidra to be able to analyze teh un-exetended EXE. After looking around I found a blog describing how to <a href="https://blog.torh.net/2015/10/30/disassemble-dos4gw/">“Disassemble DOS/4GW”</a> which hinted at being able to use the SUNSYS Bind Utility to extract a <a href="https://en.wikipedia.org/wiki/Linear_Executable">Linear Executable</a> from the extended EXE.</p>

<p>You can find a copy of the bind utility (<code class="language-plaintext highlighter-rouge">SB.EXE</code>) with the open source <a href="https://web.archive.org/web/20210726190857/https://dos32a.narechk.net/index_en.html">DOS32A</a> DOS extender. Once downloaded, you can use this directly via <a href="https://www.dosbox.com/">DOSBox</a> to unbind the EXE.</p>

<p><img src="/images/ghidra1/dosbox-sb32a.png" alt="" /></p>

<p>When running this against <code class="language-plaintext highlighter-rouge">HARVEST.EXE</code> it will produce a <code class="language-plaintext highlighter-rouge">HARVEST.LE</code>, which is the linear executable. Once we have this the next logical step would be to try and load it in Ghidra, but doing this doesn’t quite work as Ghidra thinks the LE file is raw data.</p>

<h2 id="decompiling-a-linear-executable">Decompiling a Linear Executable</h2>

<p>To successfully start analyzing <code class="language-plaintext highlighter-rouge">HARVEST.LE</code>, we’ll first need to install the <a href="https://github.com/yetmorecode/ghidra-lx-loader">Ultimate Ghidra Loader for the LX/LE executable file format</a> extension.</p>

<p><img src="/images/ghidra1/ghidra-import-le.png" alt="" /></p>

<p>After doing this and restarting Ghidra, the LE EXE format should be properly detected, and we can load this and begin our analysis. Since the goal is to develop a working <a href="https://wiki.scummvm.org/index.php/Engines">ScummVM game engine</a>, we’ll be working on a <a href="https://github.com/alexbevi/scummvm/tree/harvester"><code class="language-plaintext highlighter-rouge">harvester</code> branch of a ScummVM fork</a>.</p>

<h2 id="working-with-ghidra-via-mcp">Working with Ghidra via MCP</h2>

<p><img src="/images/ghidra1/ghidra-decompile-le.png" alt="" /></p>

<p>The <code class="language-plaintext highlighter-rouge">HARVEST.LE</code> executable has now been analyzed and we have a much more informative listing of defined strings to anchor our reverse engineering efforts on. Since we’ll be using Codex directly via Visual Studio Code, the next step is to setup Ghidra and <a href="https://github.com/bethington/ghidra-mcp">Ghidra MCP</a>.</p>

<p>This is fairly straightforward:</p>

<ul>
  <li>Download Ghidra and unzip</li>
  <li>In vscode install the <code class="language-plaintext highlighter-rouge">openai.chatgpt</code> extension</li>
  <li>Open the Codex panel and just start chatting:
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>  I've installed Ghidra at /path/to/extracted/ghidra and I want to work with it using 
  MCP. Download GhidraMCP from https://github.com/bethington/ghidra-mcp and install 
  it. Verify this is working once complete.
</pre></td></tr></tbody></table></code></pre></div>    </div>
    <p><img src="/images/ghidra1/vscode-01.png" alt="" /></p>
  </li>
</ul>

<p>With GhidraMCP installed and working, we can dig into our initial reverse engineering attempt. I started off by developing an <a href="https://gist.github.com/alexbevi/07560b7e82dd73527f4fc59ce1ed9972"><code class="language-plaintext highlighter-rouge">AGENTS.md</code></a> I could use specifically for this initiative.</p>

<p>To make it easier to port logic from the original game EXE to a ScummVM engine we’ll need to identify functions, data structures, subsystems, state management, resource handling and many other components that made up the game - so we kick things off with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="rouge-code"><pre>Read the AGENTS.md file, start at the application entry point in Ghidra and begin 
scanning the available functions (FUN_*). Based on the xrefs to those functions, 
as well as how defined strings are being used and where DOS interrupts are being 
invoked, when you have high confidence in a function's purpose, rename it with a 
meaningful name.
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">AGENTS.md</code> file outlines operating parameters such as keeping track of our findings in an <code class="language-plaintext highlighter-rouge">ARCHITECTURE.md</code> file and ongoing progress in a <code class="language-plaintext highlighter-rouge">TRACKER.md</code> file. As we progress with subsequent prompts, more of the linear executable in Ghidra will be decompiled, making it easier to work with as a source of truth for our re-implementation efforts.</p>

<p><img src="/images/ghidra1/ghidra-naming.png" alt="" /></p>

<p>Using Ghidra via the Ghidra MCP bridge makes it very easy to incrementally analyze the game executable and start piecing together how it works.</p>

<h2 id="scaffolding-a-game-engine">Scaffolding a Game Engine</h2>

<p>Once we’ve made sufficient progress unwinding some of the original main game loop and resource handling, we can start wiring up an engine using additional prompts. The <code class="language-plaintext highlighter-rouge">ARCHITECTURE.md</code> we’re evolving along with direct access to Ghidra via MCP can be used to first:</p>

<ul>
  <li>create a skeletal engine (based on ScummVM’s <a href="https://wiki.scummvm.org/index.php/HOWTO-Engines">HOWTO-Engines</a> guidance)</li>
  <li>add basic resource handling</li>
  <li>add basic audio/video decoding</li>
  <li>play our first video resources</li>
</ul>

<p><a href="https://tcrf.net/Harvester">The Cutting Room Floor</a> has an entry on Harvester, as well as some <a href="https://tcrf.net/Notes:Harvester">additional notes</a> that can be shared with Codex to inform file format disassembly and re-implementation.</p>

<p>I was actually pleasantly surprised with how easy it was to make progress using this configuration, as targeted prompts more often than not produced positive, actionable results. Within a day of starting this project I was able to get the intro videos playing, the first scene to render and the background music and sound effects working.</p>

<p><img src="/images/ghidra1/scummvm-early.png" alt="" /></p>

<p>Plenty more to do (such as fixing the palette), the main thing to highlight here is that understanding assembly language doesn’t need to be a barrier to undertaking these types of reverse engineering initiatives anymore.</p>

<!-- 
https://github.com/david-offord/harvester-bm-converter/tree/master
https://www.polygon.com/fmv-harvester-brilliantly-and-brutally-critiques-censorship/ -->]]></content><author><name></name></author><category term="Programming" /><category term="programming" /><category term="reverse-engineering" /><category term="scummvm" /><category term="ghidra" /><summary type="html"><![CDATA[Series:&nbsp;Reverse Engineering HarvesterThis review is part of the Reverse Engineering Harvester series, where I document my journey of reverse engineering the 1996 DOS game Harvester to re-implement its game engine in ScummVM.Reverse Engineering a DOS Game with Ghidra and Codex→ Reverse Engineering Harvester with Ghidra and Codex - Part 2 Article 1 of 6 in this series.]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.alexbevi.com/images/ghidra1/harvester_reverse_engineering_banner_1200x600.png" /><media:content medium="image" url="https://www.alexbevi.com/images/ghidra1/harvester_reverse_engineering_banner_1200x600.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry><entry><title type="html">Cloudflare + MongoDB: How to fix ‘Error: Dynamic require of “punycode/” is not supported’</title><link href="https://www.alexbevi.com/blog/2025/12/22/cloudflare-plus-mongodb-how-to-fix-error-dynamic-require-of-punycode-slash-is-not-supported/" rel="alternate" type="text/html" title="Cloudflare + MongoDB: How to fix ‘Error: Dynamic require of “punycode/” is not supported’" /><published>2025-12-22T10:54:11-05:00</published><updated>2026-04-29T09:45:29-04:00</updated><id>https://www.alexbevi.com/blog/2025/12/22/cloudflare-plus-mongodb-how-to-fix-error-dynamic-require-of-punycode-slash-is-not-supported</id><content type="html" xml:base="https://www.alexbevi.com/blog/2025/12/22/cloudflare-plus-mongodb-how-to-fix-error-dynamic-require-of-punycode-slash-is-not-supported/"><![CDATA[<p>If you’ve followed my <a href="/blog/2025/03/25/cloudflare-workers-and-mongodb/">previous post</a> to try and connect to MongoDB from Cloudflare workers, it’s possible you’ve come across the following issue:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
</pre></td><td class="rouge-code"><pre>Error: Dynamic require of "punycode/" is not supported
</pre></td></tr></tbody></table></code></pre></div></div>

<p>The TL;DR is there is an issue with how <code class="language-plaintext highlighter-rouge">@cloudflare/vite-plugin</code> is <a href="https://github.com/jsdom/tr46/pull/73">processing an import with a trailing slash within the <code class="language-plaintext highlighter-rouge">tr46</code> library</a>, which is a transitive dependency of the MongoDB Node.js driver. The current solution is to patch this out until a proper fix is in place.</p>

<h3 id="reproduction">Reproduction</h3>

<p>Let’s begin with a new application we can use as a minimum reproduction. Chances are you’ve already got an application ready that’s hitting this issue, but if not we can verify this behavior by simply <a href="https://developers.cloudflare.com/workers/framework-guides/web-apps/react-router/">creating a new React Router app using <code class="language-plaintext highlighter-rouge">create-cloudflare</code></a> as follows, then adding the MongoDB Node.js driver as a dependency and importing it.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="rouge-code"><pre><span class="c"># create a new react router app</span>
npm create cloudflare@latest <span class="nt">--</span> my-react-router-app <span class="nt">--framework</span><span class="o">=</span>react-router
<span class="nb">cd </span>my-react-router-app
<span class="c"># install mongodb</span>
npm <span class="nb">install </span>mongodb <span class="nt">--save</span>
<span class="c"># prepend an import to the workers/app.ts file</span>
<span class="nb">printf</span> <span class="s1">'import { MongoClient } from "mongodb";\n%s'</span> <span class="s2">"</span><span class="si">$(</span><span class="nb">cat </span>workers/app.ts<span class="si">)</span><span class="s2">"</span> <span class="o">&gt;</span> workers/app.ts
<span class="c"># update wrangler.jsonc with compatibility flags to support SSR</span>
<span class="nb">sed</span> <span class="nt">-i</span> <span class="s1">''</span> <span class="s1">'/"compatibility_date": "2025-04-04"/a\
  "compatibility_flags": ["nodejs_compat"],'</span> wrangler.jsonc
</pre></td></tr></tbody></table></code></pre></div></div>

<p>With a freshly bootstrapped application, let’s try running it to see what happens.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="rouge-code"><pre>npm run dev
<span class="o">&gt;</span> dev
<span class="o">&gt;</span> react-router dev

11:15:07 AM <span class="o">[</span>vite] <span class="o">(</span>ssr<span class="o">)</span> Re-optimizing dependencies because vite config has changed
11:15:08 AM <span class="o">[</span>vite] <span class="o">(</span>ssr<span class="o">)</span> ✨ new dependencies optimized: mongodb
11:15:08 AM <span class="o">[</span>vite] <span class="o">(</span>ssr<span class="o">)</span> ✨ optimized dependencies changed. reloading
<span class="o">[</span>vite] program reload
Error: Dynamic require of <span class="s2">"punycode/"</span> is not supported
    at null.&lt;anonymous&gt; <span class="o">(</span>/Users/alex/Temp/my-react-router-app/node_modules/.vite/deps_ssr/chunk-PLDDJCW6.js:11:9<span class="o">)</span>
    at node_modules/tr46/index.js <span class="o">(</span>/Users/alex/Temp/my-react-router-app/node_modules/tr46/index.js:3:18<span class="o">)</span>
    at __require2 <span class="o">(</span>/Users/alex/Temp/my-react-router-app/node_modules/.vite/deps_ssr/chunk-PLDDJCW6.js:17:50<span class="o">)</span>
    at node_modules/whatwg-url/lib/url-state-machine.js <span class="o">(</span>/Users/alex/Temp/my-react-router-app/node_modules/whatwg-url/lib/url-state-machine.js:2:14<span class="o">)</span>
    at __require2 <span class="o">(</span>/Users/alex/Temp/my-react-router-app/node_modules/.vite/deps_ssr/chunk-PLDDJCW6.js:17:50<span class="o">)</span>
    at node_modules/whatwg-url/lib/URL-impl.js <span class="o">(</span>/Users/alex/Temp/my-react-router-app/node_modules/whatwg-url/lib/URL-impl.js:2:13<span class="o">)</span>
    at __require2 <span class="o">(</span>/Users/alex/Temp/my-react-router-app/node_modules/.vite/deps_ssr/chunk-PLDDJCW6.js:17:50<span class="o">)</span>
    at node_modules/whatwg-url/lib/URL.js <span class="o">(</span>/Users/alex/Temp/my-react-router-app/node_modules/whatwg-url/lib/URL.js:499:14<span class="o">)</span>
    at __require2 <span class="o">(</span>/Users/alex/Temp/my-react-router-app/node_modules/.vite/deps_ssr/chunk-PLDDJCW6.js:17:50<span class="o">)</span>
    at node_modules/whatwg-url/webidl2js-wrapper.js <span class="o">(</span>/Users/alex/Temp/my-react-router-app/node_modules/whatwg-url/webidl2js-wrapper.js:3:13<span class="o">)</span> <span class="o">{</span>
  <span class="o">[</span>cause]: undefined
<span class="o">}</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Vite is complaining that the dynamic require of “punycode/” is not supported. The trailing slash following “punycode” is interesting, but we should first see where it’s being imported. We can do this by using <a href="https://docs.npmjs.com/cli/v7/commands/npm-ls"><code class="language-plaintext highlighter-rouge">npm ls</code></a> to quickly narrow down usage of <code class="language-plaintext highlighter-rouge">punycode</code> to the <code class="language-plaintext highlighter-rouge">tr46</code> library:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre>npm <span class="nb">ls </span>punycode
my-react-router-app@ /Users/alex/Temp/my-react-router-app
└─┬ mongodb@7.0.0
  └─┬ mongodb-connection-string-url@7.0.0
    └─┬ whatwg-url@14.2.0
      └─┬ tr46@5.1.1
        └── punycode@2.3.1
</pre></td></tr></tbody></table></code></pre></div></div>

<p>Inspecting the <code class="language-plaintext highlighter-rouge">tr46</code> library at <a href="https://github.com/jsdom/tr46/blob/main/index.js">https://github.com/jsdom/tr46/blob/main/index.js</a> shows the trailing slash on the import as well:</p>

<div class="language-js highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre><span class="dl">"</span><span class="s2">use strict</span><span class="dl">"</span><span class="p">;</span>

<span class="kd">const</span> <span class="nx">punycode</span> <span class="o">=</span> <span class="nf">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">punycode/</span><span class="dl">"</span><span class="p">);</span> <span class="c1">// &lt;--- this is the line in question</span>
<span class="kd">const</span> <span class="nx">regexes</span> <span class="o">=</span> <span class="nf">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">./lib/regexes.js</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">const</span> <span class="nx">mappingTable</span> <span class="o">=</span> <span class="nf">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">./lib/mappingTable.json</span><span class="dl">"</span><span class="p">);</span>
<span class="kd">const</span> <span class="p">{</span> <span class="nx">STATUS_MAPPING</span> <span class="p">}</span> <span class="o">=</span> <span class="nf">require</span><span class="p">(</span><span class="dl">"</span><span class="s2">./lib/statusMapping.js</span><span class="dl">"</span><span class="p">);</span>
<span class="c1">// ...</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>I initially tried to open a PR at <a href="https://github.com/jsdom/tr46/pull/73">https://github.com/jsdom/tr46/pull/73</a> to sort this out, but the maintainer points out that the issue is with Vite, so we’ll need to look elsewhere for a solution. This change (introduced in commit <a href="https://github.com/jsdom/tr46/commit/fef6e95243caaa0e46a1aa42fa21af6caef11e51"><code class="language-plaintext highlighter-rouge">fef6e95</code></a>) was likely done to address <code class="language-plaintext highlighter-rouge">punycode</code> deprecation warnings such as that described in <a href="https://github.com/jsdom/tr46/issues/63">https://github.com/jsdom/tr46/issues/63</a>. For more info on those deprecations see <a href="https://medium.com/@asimabas96/solving-the-punycode-module-is-deprecated-issue-in-node-js-93437637948a">“Solving the "Punycode Module is Deprecated" Issue in Node.js”</a>.</p>

<h3 id="patching">Patching</h3>

<p>We’re going to solve this issue in a roundabout fashion using <a href="https://www.npmjs.com/package/patch-package"><code class="language-plaintext highlighter-rouge">patch-package</code></a> to modify the <code class="language-plaintext highlighter-rouge">punycode</code> import directly in our <code class="language-plaintext highlighter-rouge">node_packages</code> and then have a <code class="language-plaintext highlighter-rouge">postinstall</code> script that will ensure the patch is consistently applied.</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="rouge-code"><pre><span class="c"># install patch-package</span>
npm <span class="nb">install </span>patch-package
<span class="c"># update package.json to run patch-package as well as cf-typegen (which is there by default)</span>
npm pkg <span class="nb">set </span>scripts.postinstall<span class="o">=</span><span class="s2">"patch-package &amp;&amp; npm run cf-typegen"</span>
<span class="c"># update node_modules/tr46/index.js to remove the trailing slash from the import</span>
<span class="nb">sed</span> <span class="nt">-i</span> <span class="s1">''</span> <span class="s1">'s/require("punycode\/")/require("punycode")/g'</span> node_modules/tr46/index.js
<span class="c"># create a patch for the tr46 package based on the above change</span>
npx patch-package tr46
<span class="c"># reinstall and apply patches</span>
npm <span class="nb">install</span>
</pre></td></tr></tbody></table></code></pre></div></div>

<p>That should do it! When we run <code class="language-plaintext highlighter-rouge">npm install</code> it will also run the <code class="language-plaintext highlighter-rouge">postinstall</code>, which will apply the patch we just created.</p>

<h3 id="summary">Summary</h3>

<p>Though patching transient dependencies to work around an issue like this is not ideal, it does offer a path forward for anyone hitting this specific error. To summarize what we did to address the issue:</p>

<ol>
  <li>Install the <code class="language-plaintext highlighter-rouge">patch-package</code> library (<code class="language-plaintext highlighter-rouge">npm install patch-package</code>)</li>
  <li>Update your <code class="language-plaintext highlighter-rouge">package.json</code>’s <code class="language-plaintext highlighter-rouge">scripts.postinstall</code> to prepend a <code class="language-plaintext highlighter-rouge">patch-package</code> script to any <code class="language-plaintext highlighter-rouge">postinstall</code> scripts that may already be present</li>
  <li>Modify <code class="language-plaintext highlighter-rouge">node_modules/tr46/index.js</code> to remove the trailing <code class="language-plaintext highlighter-rouge">/</code> from <code class="language-plaintext highlighter-rouge">require("punycode/")</code></li>
  <li>Create the patch by running <code class="language-plaintext highlighter-rouge">npx patch-package tr46</code></li>
  <li>Ensure the patch is applied by running <code class="language-plaintext highlighter-rouge">npm install</code></li>
</ol>

<p>Hopefully we can get this sorted out more cleanly (reported as <a href="https://github.com/cloudflare/workers-sdk/issues/11751">https://github.com/cloudflare/workers-sdk/issues/11751</a>), but in the meantime feel free to use this approach if you find it suitable.</p>]]></content><author><name></name></author><category term="MongoDB" /><category term="mongodb" /><category term="drivers" /><category term="node" /><category term="nodejs" /><category term="javascript" /><category term="typescript" /><summary type="html"><![CDATA[If you’ve followed my previous post to try and connect to MongoDB from Cloudflare workers, it’s possible you’ve come across the following issue:]]></summary><media:thumbnail xmlns:media="http://search.yahoo.com/mrss/" url="https://www.alexbevi.com/images/mongodb-cloudflare.png" /><media:content medium="image" url="https://www.alexbevi.com/images/mongodb-cloudflare.png" xmlns:media="http://search.yahoo.com/mrss/" /></entry></feed>