What Is AI Agent Memory? Working, Episodic, Semantic, and Procedural

Most people’s mental model of AI memory is the context window. You paste in some text, the model reads it, it answers. The bigger the window, the better the memory. Right?

Not really. A context window is short-term attention, not memory. When the conversation ends, it’s gone. When it gets long enough, the middle of it is effectively gone too. And a context window is per-conversation — it knows nothing about the fifty conversations you had last month. Calling that “memory” is like calling the sentence you’re currently reading your “life experience.”

Real agent memory is several different systems doing several different jobs. Borrowing the vocabulary from cognitive science is useful here, because the human distinctions map cleanly onto what an autonomous agent actually needs. There are four layers. Each answers a different question.

Working memory: what’s happening right now

Working memory is the scratchpad for the current task. The user’s last few messages, the intermediate results of the step the agent is on, the variables it’s juggling to finish what it’s doing.

This is the layer most people mistake for the whole of memory, because it’s the one the context window provides. It’s essential and it’s also the most expendable — once the task is done, almost none of it needs to survive. You don’t remember the exact wording of every sentence in a conversation from six months ago. You remember what it was about. Working memory is the raw material; the durable layers are what’s distilled from it.

For an agent, working memory is cheap, fast, and local. The design challenge isn’t storing it — it’s deciding what to promote out of it before it disappears.

Episodic memory: what happened

Episodic memory is the record of specific events. Last Tuesday this user asked me to build a revenue dashboard, I connected their Stripe account, and they preferred the weekly view over the monthly one. It’s autobiographical. It’s tied to a time and a context.

This is the layer that lets an agent say “last time you asked for this, we did it that way” — which is the difference between a tool that feels like it knows you and a tool that greets you as a stranger every morning. Episodic memory is why the second conversation can be better than the first.

The hard part of episodic memory is selectivity. If an agent tried to remember every event in full detail, it would drown — and retrieval would slow to a crawl as the store filled with noise. Good episodic memory is lossy on purpose. It keeps the events that carried signal (decisions, preferences, corrections, outcomes) and lets the rest fade.

Semantic memory: what’s true

Semantic memory is facts, stripped of the episode that produced them. Not “last Tuesday the user said they’re in the EU” but simply: this user is in the EU. Not the conversation where they mentioned their company has 40 employees, but the standing fact that it does.

Semantic and episodic memory work as a pair. Episodes are the events; semantic facts are the conclusions you draw from them and keep. The first time a user corrects a currency from dollars to euros, that’s an episode. The durable fact — this user works in euros — is semantic. From then on, the agent shouldn’t need the episode to get the currency right.

This is also where a knowledge base or RAG vault lives: documents, reference material, and domain knowledge the agent can retrieve and reason over. Semantic memory is the agent’s model of how the world — and specifically your world — actually is.

Procedural memory: how to do things

Procedural memory is skill. How to do a thing, not what happened or what’s true. For a human it’s riding a bike or touch-typing — knowledge that lives in the doing rather than in facts you could recite.

For an agent, procedural memory is the accumulated know-how of how to accomplish a task well in this specific environment: which sequence of steps reliably builds a working dashboard, which tool to reach for in which situation, which approaches have failed before and shouldn’t be tried again. It’s the layer that lets the system get better at a recurring task rather than just remembering that the task happened.

The four layers compound. Working memory runs the current step. Episodic memory recalls how similar steps went. Semantic memory supplies the stable facts the step depends on. Procedural memory supplies the method. An agent missing any one of them has a characteristic blind spot — no working memory and it can’t finish a task; no episodic and it can’t learn from you; no semantic and it re-asks what it already knows; no procedural and it never improves.

The problem nobody talks about: memory rots

Here’s what most discussions of agent memory leave out. Memory isn’t just a storage problem. It’s a maintenance problem, and an unmaintained memory store gets worse over time, not better.

Think about what accumulates. The user says they prefer weekly views. Three months later they switch to monthly. Now the store holds two contradictory preferences, and the agent has no idea which one is current. The same fact gets written five slightly different ways across five sessions, so retrieval returns near-duplicates that crowd out everything else. Facts that were true a quarter ago — last year’s budget, a deprecated workflow, a project that shipped — are still sitting there, indistinguishable from the live ones.

This is memory rot, and it’s the reason a lot of “memory-enabled” agents feel worse after a month of use than they did on day one. The store fills with stale, redundant, and contradictory entries. Retrieval quality collapses. The agent starts confidently acting on facts that expired.

A storage layer alone doesn’t fix this. You need an active process that maintains the store the way memory consolidation works in sleep — pruning, merging, and reorganizing what was learned so the useful parts strengthen and the noise fades.

Consolidation: memory that maintains itself

The fix is a consolidation pass that runs in the background, between sessions, and does three jobs.

Deduplicate. When the same fact has been written multiple ways, collapse it into one canonical entry. Five near-identical notes about a user’s timezone become one.

Supersede. When a newer fact contradicts an older one, the new one wins and the old one is retired. Prefers weekly gets replaced by prefers monthly — not stored alongside it, leaving the agent to guess.

Decay. When a fact hasn’t been relevant in a long time and nothing reinforces it, let it fade. Not everything deserves to be remembered forever; a healthy memory forgets on purpose, the same way you’ve forgotten the details that didn’t matter.

At DeepHarness this consolidation is autonomous — you don’t trigger it, and you don’t curate the store by hand. Memory is keyed to your stable identity rather than a browser session, so it follows you across devices and conversations, and the consolidation runs on its own so the store gets cleaner with use instead of more cluttered. The goal is a system where the hundredth conversation is the best one, because by then the agent’s picture of you is well-formed, current, and free of the contradictions that would have accumulated in a store nobody maintained.

Memory is what makes an agent an agent

A model without memory is a brilliant consultant with amnesia — sharp in the moment, useless as a long-term partner, because every engagement starts from zero. The thing that turns a capable model into a system that compounds in value is the memory architecture wrapped around it: the four layers that let it finish a task, learn from your history, know what’s true, and get better at the work — plus the consolidation that keeps all of it honest over time.

When you evaluate an AI agent platform, don’t ask how big the context window is. Ask what it remembers between sessions, how it decides what’s worth keeping, and what happens to a fact when it stops being true. The answers tell you whether you’re looking at memory, or just a very large short-term attention span.