Organizational memory is the first agent system I would build

Every agent answers from something. Make it the company.

Jun 07, 2026

The views expressed here are my own and are not related to or reflective of my work or any organization I am affiliated with.

I used to think the first serious agent in a company should be operational.

A support triage agent. A sales research agent. A coding agent. A weekly reporting agent.

Those still matter. But the more I work with agents, the more I think I had the order wrong.

Most teams start by asking which task an agent can take off their plate. The better first question is what context would make any agent worth trusting.

The first thing I would build is organizational memory.

It does not sound exciting. It sounds like documentation, and most documentation goes stale and unread.

But every useful agent hits the same wall. It does not know what the organization knows. It does not know why a decision was made, or which plan is dead but still sitting in a doc, or the difference between a stale priority and a current constraint.

So it fills the gap with plausible text.

Call this the Tourist Problem. The output is polished. The reasoning sounds coherent. It reads well. And none of it is grounded in the organization. It is grounded in the model. It is a tourist describing a city it has never lived in.

The model was not in the meetings.

The problem is not that companies lack information

Most have too much.

Docs. Slack threads. Meeting notes. Jira tickets. Notion pages. Customer calls. Postmortems. A file named Q3 Strategy Final v2 Actually Final.

The problem is not scarcity. It is retrieval, interpretation, and continuity.

People know things, but the knowledge is scattered. Decisions exist, but the reasoning is buried. New people inherit the artifacts without the conversations that produced them.

A human compensates for this socially. We remember last month’s argument. We know which plan is dead even though it still lives in a doc.

An agent has none of that unless we build it in. So in a company that runs on agents, memory is not a side project. It is the substrate every other agent reasons from. Build it first and everything downstream gets cheaper. Skip it and you pay for it in every agent you ship after.

The obvious agents all depend on it

The tempting builds are everywhere. Summarize calls. Draft support replies. Route product feedback. Write the weekly leadership brief. Almost all of them need to know which customers are strategic, which requests match the roadmap, what the company is optimizing for.

Without memory, each one is a clever intern with no onboarding. It helps, but you re-explain the company every time. That barely saves time.

What organizational memory actually means

Organizational memory is the company’s operating context, in a form a human can inspect and an agent can use.

It is not one big database where every document goes. That is the old failure mode: put everything in a wiki, hope people search it, watch it decay.

The test is behavioral. A memory item should change how an agent acts.

If it changes nothing, it is documentation, not memory.

Concretely, it is a handful of files:

memory/
  company-state.md
  decision-log.md
  workflow-index.md
  experiment-scorecards.md
  open-questions.md

Four of them hold the layers below. The fifth, open-questions.md, is where the agent writes down what it could not resolve, instead of guessing.

Layer one is the current state

The live operating context. company-state.md:

# Company state

Current priorities:
- Cut enterprise support response time
- Cut onboarding time for new customers
- Run the Q3 renewal risk review

Current constraints:
- No new headcount this quarter
- Support automation stays human-approved
- Reliability over feature breadth

Open decisions:
- Does renewal risk sit with CS or Finance?
- Which segments get white-glove support?

It is not meant to be complete. It is meant to be useful. It gives an agent enough to stop treating every task as if the company were born this morning.

Layer two is decision memory

What we decided, and why.

# Decision: Support automation stays human-approved

Date: 2026-06-03
Owner: Head of Support

Decision:
Agents may draft support replies. They may not send them.

Why:
- Enterprise customers expect an accountable response
- Drafts are useful but not yet reliable enough
- A wrong answer carries contract and trust risk

Review:
Revisit after 30 days of draft scorecards.

This is not bureaucracy. It is context compression. Instead of making every future agent and new hire reconstruct the reasoning, the reasoning stays retrievable.

Layer three is workflow memory

How the work runs. The role, the bounds, the output, the gate.

# Workflow: Product feedback routing

Agent role:
Classify the request, find related feedback, suggest a route.

Do not:
- Promise delivery dates
- Create roadmap commitments
- Reply to the customer

Output:
- Summary, category, related requests
- Suggested owner and confidence
- Escalation reason if confidence is low

Human approval:
Required before anything reaches the roadmap or the customer.

This is where agent engineering stops looking like prompting and starts looking like management. The role is defined. The output is specified. The approval gate is explicit.

Layer four is experiment memory

The layer most teams skip, and the most important one. Every experiment should leave a scorecard.

# Experiment: Chief of Staff daily brief

Goal:
Summarize priorities, risks, and open decisions for leadership.

What worked:
- Surfaced stale open decisions
- Connected a support trend to renewal risk

What failed:
- Overweighted one loud customer issue
- Missed a constraint sitting in the finance notes

Would I run it again?
Yes, with better retrieval from finance.

Otherwise the pattern repeats: someone demos an agent, everyone is impressed, the failure modes get discussed once, and the learning is gone by Friday.

A demo that works once is not a system.

This is also why retrieval matters. RAG gets sold as a trick for cutting hallucinations. Its real job is to make the agent answer from the company, not from the model. Memory is what there is to retrieve.

Start small. Keep it visible.

You do not need the right architecture to learn whether memory helps. You need a surface that is current, structured, and easy to inspect.

So start with files. A human can read them, an agent can use them, and you can diff, review, and delete them when they are wrong.

The first workflow I would run against them is a Chief of Staff daily brief. Read the files, cite the memory item behind each claim, and log anything unclear as an open question instead of guessing.

Later, if the memory grows, you can add retrieval, embeddings, and access controls. Not first.

How memory fails

In predictable ways.

It goes stale. A stale memory system is worse than none, because it gives old context the authority of current truth.

It gets too big. If everything is equally important, the agent has no signal. It retrieves noise and calls it context.

It gets too vague. A page of principles reads well, but an agent needs constraints, owners, and dates. Principles without those are documentation, not memory.

It hides accountability. If the memory says “the company decided” with no owner and no date, the decision becomes folklore.

It breeds false confidence. The agent cites a document, the citation looks official, and nobody notices it went stale in March.

So memory needs maintenance. Not much, but enough. Each item should carry an owner, a date, a status, and a review date. The goal is not a tidy wiki. It is memory an agent can use and a human can check.

It breaks at scale. Build it anyway.

Will five files survive a real company? Not for long. Concurrent agents will collide on writes. The logs will outgrow the context window and force a retrieval layer. Private context will need access controls the files do not have.

All true. All Day 30.

But Day 30 is a problem you earn by surviving Day 1, and most agent experiments never get there. You do not need a database to learn whether an agent can reason from the company. You need a surface that is current, inspectable, and honest. Build the database when the files start to hurt.

The practical rule

The question to start with is not which agent to build. It is what this agent needs to know that should not live only in someone’s head.

That moves the work upstream. From generation to memory. From prompting to context.

The smallest honest test is five steps:

Write the five memory files. Keep them short.
Point one agent at them, starting with the daily brief.
Make it cite every claim back to a memory item.
Check whether it stayed grounded, surfaced the right priorities, and named what it did not know.
Ask whether you would run it again tomorrow. If yes, you have a system. If no, you have a demo.

The first agent system is not the one that writes the best answer. It is the one that helps the organization keep what it already learned.

Once that memory exists, every other agent gets better. The support agent has context. The product agent has priorities. The finance agent has assumptions. The Chief of Staff agent has a company to reason from.

Without it, agents stay tourists. With it, they can reason from the company instead of the model.

This is the argument, not the build. The Chief of Staff brief is the experiment I would run next, written in the conditional on purpose. When I run it for real, the scorecard becomes the next post.

Related: What Developers Get Wrong About Building with AI — the older argument this one extends. Memory is the moat, not the capability.

Thinking Through AI

Discussion about this post

Ready for more?