Do AI Agents Need a Memory Layer? Mem0 vs Letta vs Zep
In short
Most AI agents do not need a dedicated memory layer. My reputation SaaS auto-reply agent has run in production for over a year on a single Postgres table with a pgvector embedding column, at $0 in extra infrastructure, and memory has never been the bottleneck. Vendors like Mem0, Letta, and Zep earn their per-turn latency and their bill only when you can name a concrete consolidation, decay, or cross-agent requirement. This post is the decision framework I now use before recommending any of them.

On this page
- What is an agent memory layer?
- Do you actually need one?
- Why is Mem0 the default pick in 2026?
- What makes Letta different?
- When does Zep's temporal knowledge graph win?
- How do Mem0, Letta, Zep, and plain Postgres compare?
- When does plain Postgres beat all three?
- Which memory option fits your app shape?
- Key takeaways
The category got real money behind it in late 2025. Mem0's $24M Series A and AWS picking it as the exclusive memory provider for its Agent SDK turned agent memory from a research idea into a funded product category. By April 2026 the field had consolidated to four names: Mem0, Letta, Zep with its Graphiti engine, and LangMem. Mid-2026 is the moment teams feel forced to pick. Most of them should not pick anything yet.
What is an agent memory layer?
An agent memory layer is a service that extracts, consolidates, decays, and retrieves long-term facts about users and agents across sessions, beyond what fits in the context window. It is not RAG over documents. RAG retrieves knowledge from a corpus you control; memory accumulates facts about a relationship over time.
That one-sentence definition hides four distinct jobs:
- Extraction. During or after a conversation, an LLM pass pulls out durable facts: "user prefers concise answers", "user's company has 12 locations".
- Consolidation. New facts get merged with old ones. "User is vegetarian" plus a later "user ordered the chicken" needs conflict resolution, not two contradictory rows.
- Decay. Facts age out or get invalidated. "User lives in Austin" stops being true the day they move.
- Retrieval. On every turn, the layer injects the handful of facts relevant to the current message.
The contrast with RAG matters because teams keep buying memory products to solve retrieval problems. If your agent needs to answer questions from your docs, that is retrieval over a corpus, the kind of RAG system I build for clients ↗ at $4k to $12k for an MVP. I covered where retrieval itself is heading in my piece on agentic retrieval ↗. Memory is the other axis: not what the agent knows, but what it remembers about you.
Do you actually need one?
You need a memory layer only if you answer yes to at least one of three questions: multi-session personalization, shared memory across parallel agents, or consolidation and conflict resolution. If the answer to all three is no, close the vendor tabs. A Postgres table covers you, and I will show the exact schema below.
Here is the framework as I actually run it:
- Do users return across sessions and expect the agent to remember them? A support bot that should recall last week's ticket: yes. A one-shot document summarizer: no.
- Do multiple agents or parallel sessions need to share one memory state? If agent A learns something agent B must act on within minutes, you have a shared-state problem that a naive table handles badly.
- Do memories need consolidation or decay? If facts about a user change and contradict over time, and stale facts cause wrong behavior, you need conflict resolution logic you probably should not write yourself.
If you can't name a consolidation or decay requirement, you don't need a memory vendor; you need a Postgres table with an embedding column.
That sentence has saved several of my consulting conversations from a premature vendor evaluation.
Why is Mem0 the default pick in 2026?
Mem0 is the funding-validated default. It raised $24M total (Series A led by Basis Set, announced October 28, 2025, with Y Combinator and Peak XV participating), has 48,000+ GitHub stars, and is the exclusive memory provider for AWS's Agent SDK; its API calls grew from 35M to 186M during 2025. As of June 2026, if you want a hosted memory API with an ecosystem that meets you halfway, Mem0 is the safe call.

Architecturally, Mem0 sits beside your agent as an API. You call add() with conversation messages and it runs LLM extraction, deduplication, and storage into a vector store with an optional graph layer; you call search() per turn to fetch relevant memories. The Node SDK (mem0ai on npm) is mature by JavaScript standards, with TypeScript types and near parity with the Python client for core operations, which was not true of every competitor when I evaluated in early 2026. Native integrations in CrewAI, Flowise, and Langflow mean low-code teams get it almost for free.
The trade-off to understand: the hosted platform is the smooth path, while self-hosting the Apache 2.0 core means operating your own vector store and paying for the extraction LLM calls yourself. Either way, every memory write is an LLM call somewhere. Budget for it.
What makes Letta different?
Letta is the MemGPT lineage: memory the agent itself owns and edits. Instead of a sidecar API skimming facts in the background, Letta gives the agent named memory blocks that it rewrites with its own tool calls. In April 2026 Letta shipped Letta Code and a Conversations API for shared memory across parallel sessions.
Letta is heavier conceptually because it is closer to an agent runtime than a library. You adopt its server, its agent abstractions, and its view that memory editing is a tool the agent calls when it decides a fact matters. That is genuinely useful when the agent should reason about what to remember rather than have facts extracted behind its back. The Conversations API answers my second framework question directly: parallel sessions of one agent reading and writing shared memory blocks without trampling each other. If your architecture is a fleet of workers acting as one agent, Letta has the strongest answer of the three. The cost is tokens and complexity: the agent spends part of its budget managing memory, and you debug memory behavior as agent behavior.
When does Zep's temporal knowledge graph win?
Zep wins when your facts are entities with relationships that change over time. Its Graphiti engine builds a temporal knowledge graph: facts carry valid-from and valid-to timestamps, so "user moved from Austin to Denver" invalidates the old city without deleting history. For CRM-style agent state, it is the strongest model here.
Where Mem0 stores discrete memory entries, Graphiti stores entities and edges with explicit time bounds. When a new fact contradicts an old one, the old edge gets an invalid-at timestamp instead of being overwritten, so the agent can answer both "where does this customer live" and "where did they live when they signed the contract". If your agent serves sales, support, or account management workflows where the state of record is people, companies, and subscriptions that evolve, Zep models that more honestly than flat entries. The graph build adds write-path cost, and for simple preference recall it is more machinery than you need.
How do Mem0, Letta, Zep, and plain Postgres compare?
Here is the comparison as of June 2026, with latency figures from my own small-workload testing rather than vendor benchmarks. Treat the latency column as the deciding row: a memory call sits on the critical path of every turn, so 200ms of added retrieval is a real tax on a chat or voice product.
| Option | Architecture | Hosted / self-host | Node SDK | Latency added per turn | Pricing | Best fit |
| Mem0 | LLM extraction into vector store, optional graph | Both; core is Apache 2.0 | mem0ai, typed, mature | ~150-300ms retrieval; writes can run async | Free tier, then usage-based hosted plans | User personalization at scale, broad framework integrations |
| Letta | Agent-owned memory blocks (MemGPT lineage) | Both (Letta Cloud or self-hosted server) | Official SDK, runtime-flavored | Variable; memory edits add LLM calls inside the loop | Free self-host; usage-based cloud | Autonomous agents managing their own memory; parallel sessions via Conversations API |
| Zep | Temporal knowledge graph (Graphiti) | Both; Graphiti is open source | Official TypeScript SDK, solid | ~100-250ms retrieval; graph build async | Free tier, then usage-based hosted | Entity-heavy CRM-style state with time-aware facts |
| Postgres + pgvector | One table, embedding column, SQL | Any managed or self-hosted Postgres | pg, boring and proven | ~5-20ms in-VPC | $0 extra on your existing database | Single-tenant or under 10k users, simple preference recall |
Measure these numbers in your own stack before deciding. I trace per-turn latency with the OpenTelemetry setup from my agent observability guide ↗, and the memory span is regularly the second-largest after the model call itself.
When does plain Postgres beat all three?
Plain Postgres with pgvector beats all three when you are single-tenant or under roughly 10k users, you need simple preference recall rather than consolidation, and your facts rarely contradict each other. The cost is $0 in extra infrastructure if you already run Postgres, and retrieval is single-digit milliseconds in the same VPC.
The entire schema:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE memories (
id BIGSERIAL PRIMARY KEY,
user_id TEXT NOT NULL,
content TEXT NOT NULL,
embedding VECTOR(1536) NOT NULL,
category TEXT NOT NULL DEFAULT 'general',
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX memories_user_idx ON memories (user_id, category);
CREATE INDEX memories_embedding_idx ON memories
USING hnsw (embedding vector_cosine_ops);
And the retrieval function, the whole "memory layer" in about 30 lines of Node:
import pg from "pg";
import OpenAI from "openai";
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
const openai = new OpenAI();
export async function recallMemories(userId, query, opts = {}) {
const { category = null, limit = 5, minSimilarity = 0.55 } = opts;
const { data } = await openai.embeddings.create({
model: "text-embedding-3-small",
input: query,
});
const embedding = `[${data[0].embedding.join(",")}]`;
const { rows } = await pool.query(
`SELECT content, category, updated_at,
1 - (embedding <=> $2) AS similarity
FROM memories
WHERE user_id = $1
AND ($3::text IS NULL OR category = $3)
ORDER BY embedding <=> $2
LIMIT $4`,
[userId, embedding, category, limit]
);
return rows.filter((r) => r.similarity >= minSimilarity);
}
The write path is one LLM call at session end that extracts facts as JSON and upserts them. Getting that JSON reliable is its own topic; I compared the options in my structured outputs piece ↗.
This is exactly what I shipped for my reputation SaaS auto-reply agent. Business tone preferences ("formal, no exclamation marks, sign as Maria") and per-location facts ("free parking behind the building") live in this table, keyed by business and location. It has generated tens of thousands of review replies, and the memory table has never been the bottleneck; the model call always is. When I scope agent and automation projects for clients ↗, this table is my default, and a memory vendor enters the picture the day memories need consolidation and decay across multiple agents, not before.
Which memory option fits your app shape?
Map your app shape to the tree below. The branches encode the three framework questions plus the entity-versus-preference distinction. In my consulting work, around four out of five agent builds land on the Postgres leaf, and that ratio has held even after the 2025 funding wave made the vendors easy to adopt.
Do returning users expect the agent to remember them?
|
+-- No -> skip the memory layer (context window + session store)
|
+-- Yes
|
+-- Facts are entities that change over time
| (customers, deals, locations)? -> Zep (Graphiti)
|
+-- Agent should own and edit its memory,
| or parallel sessions share state? -> Letta
|
+-- Hosted user-fact recall with broad
| framework integrations? -> Mem0
|
+-- Simple preference recall, <10k users,
no consolidation or decay? -> Postgres + pgvector
Start at the bottom leaf and earn your way up. Migrating a memories table into Mem0 or Zep later is a weekend of work; unwinding a vendor dependency you never needed is much worse.
Key takeaways
- An agent memory layer extracts, consolidates, decays, and retrieves long-term facts across sessions. It is not RAG over documents, and buying one to fix retrieval is the most common mistake I see.
- Run the three-question test: multi-session personalization, cross-agent shared state, consolidation and conflict resolution. Zero yes answers means you do not need a vendor.
- Mem0 is the funded default as of June 2026: $24M raised, 48,000+ GitHub stars, exclusive memory provider for AWS's Agent SDK, API calls up from 35M to 186M during 2025.
- Letta fits agents that own and edit their memory, especially parallel sessions sharing state through its April 2026 Conversations API. Zep fits entity-heavy, time-changing CRM-style facts.
- A Postgres table with a pgvector column costs $0 extra, adds 5-20ms per turn, and has carried my production auto-reply agent for over a year without becoming the bottleneck.
FAQ
What is the difference between RAG and agent memory?
RAG retrieves knowledge from a document corpus to ground answers; agent memory stores and retrieves facts about users and agents accumulated across sessions. RAG answers "what does the documentation say", memory answers "what does this user prefer". Both use embedding search, but memory adds extraction, consolidation, and decay that RAG never needs.
Can I build AI agent memory with just Postgres and pgvector?
Yes, and under roughly 10k users it is usually the right call. One table with user_id, content, embedding, category, and updated_at, plus a cosine-similarity query, covers preference recall. You give up automatic consolidation and decay, so it fits apps where facts rarely contradict. My production auto-reply agent runs on exactly this.
Is Mem0 open source and free for production?
The Mem0 core is open source under Apache 2.0, so you can self-host it in production for free; you still pay for your own LLM and embedding calls plus a vector store. The hosted platform has a free tier with usage-based paid plans above it as of June 2026. Verify current limits before committing.
Working on something like this?
I build web apps, AI features, and mobile products for clients. If this article matches a problem you have, tell me about it.
Start a conversationMalik Hamza Shabbir · Full-Stack & AI Engineer
I build full-stack and AI products solo: a reputation SaaS in production, RAG pipelines, and React Native apps. I write from what I ship, not from documentation summaries.
Related articles
Reliable JSON From LLMs: Structured Outputs Compared 2026
Strict structured outputs hold ~99.9% schema compliance while plain JSON mode fails 8-15% of the time. I compare OpenAI, Claude, and Gemini with one Zod schema.
How to Migrate Your MCP Server to the 2026 Stateless Spec
The final MCP spec ships July 28, 2026 and removes sessions from the protocol. I migrated my production Node server; here is the exact diff and checklist.
Private RAG on Local Models: Qwen3 vs Gemma 4 in 2026
Yes, you can ship private RAG on one 24GB GPU in 2026. I ran a 50-question eval: Gemma 4 26B MoE wins English corpora, Qwen3.6 27B wins multilingual.