
A few days ago, Taranjeet, the CEO of Mem0, reacted to one of my articles about building AI memory with knowledge graphs. That caught my attention.
Mem0 is one of the most popular memory frameworks in the AI space. Thousands of developers use it. And here I was, running a heavier, more expensive architecture with Graphiti and Neo4j for my personal project.
Was I over-engineering this?
I had to find out. So I built a benchmark.
Quick Context: Why I Care About AI Memory
I've been building Synapse, an AI companion for my wife. Not a chatbot. A companion that remembers her life, her relationships, her emotional states, and how all of that connects over time.
It started with a 35,000-token "Master Prompt" that she maintained manually in Notion. Every time something changed in her life, she updated it by hand. That obviously didn't scale. So I moved to Graphiti, a knowledge graph framework that extracts entities and relationships from conversations automatically.
I wrote about this journey in two previous articles:
- Beyond RAG: Building an AI Companion with Deep Memory Using Knowledge Graphs (how knowledge graphs replaced the manual prompt)
- Scaling AI Memory: How I Tamed a 120K-Token Prompt with Deterministic GraphRAG (how I kept the prompt under control as the graph grew)
The system works well. But when I started looking at Mem0, I realized they solve some of the same problems (fact extraction, deduplication, contradiction handling) with a different architecture. They use a vector store as the primary brain and offer an optional graph layer on top. Fewer LLM calls per ingestion, and a fundamentally different take on how to combine vectors and graphs.
I wanted to understand both approaches. What does storing everything in one graph give you? What does splitting vectors and graphs into independent stores give you? What do you lose in each case?
Two Fundamentally Different Philosophies
Before the benchmark, let me explain what each system actually does under the hood. They both ingest conversations and store memories. But the architecture is completely different.
Graphiti: The Unified Graph
Graphiti puts everything in one place: a Neo4j graph database. Entities become nodes. Facts become edges. Embeddings live as properties on those nodes and edges.
The key detail: each edge carries a full natural-language fact, plus temporal fields. When a fact becomes outdated, Graphiti doesn't delete it. It marks it with an invalid_at timestamp and creates the new fact alongside it.
Mem0: The Split Architecture
Mem0 takes a different approach. The primary brain is a vector store (Qdrant, Pinecone, etc.) holding atomic fact strings. It has an optional graph (Neo4j), but it runs as a completely independent parallel system.
The vector store holds rich text. The graph holds thin triples: entity -> relationship_type -> entity. No natural-language facts on edges. No temporal fields. And critically: the two stores share no IDs and run independently. They can drift out of sync.
What's Actually Stored on an Edge
This is the single most important difference. Let me show it concretely.
Graphiti edge:
source: "Demy"
target: "Maplewood"
relation_type: "WORKS_AT"
fact: "Demy started working at the startup Maplewood
doing full-stack work, not just backend"
valid_at: 2026-02-15
invalid_at: null
embedding: [0.012, -0.034, ...]
Mem0 graph edge:
source: "demy"
target: "maplewood"
relationship: "WORKS_AT"
valid: true
mentions: 2
Graphiti stores the full story on every edge. Mem0 stores the label on the graph edge and puts the text in the vector store as a separate entry. For retrieval this means: Graphiti can give you structure AND semantics in one query. Mem0 needs two separate lookups and hopes they align.
The "Aha!" Moment: Context Blindness
Before I show you the benchmark results, I need to explain the insight that made this comparison matter to me. Because the results only make sense once you understand what "context blindness" means in practice.
The Problem with Pure RAG
Most AI memory systems work like this: user asks something, you do a similarity search, you inject the top-K results into the prompt. Simple and effective.
But there's a hidden cost. The LLM only sees what the similarity search returns. If the user asks about work, and the search returns work facts, the model has no idea about the emotional context from childhood that might be relevant. It's blind to everything outside the search window.
I call this context blindness: the LLM's intelligence is limited by the narrow slice of memory that semantic similarity surfaces for each turn.
Why This Matters for a Companion
Modern models are incredible at reasoning over large contexts. Give them 50k tokens of well-organized information about a person's life, and they make connections you didn't explicitly ask for. They notice patterns. They bring up relevant history naturally.
But you can't give them everything. That's expensive and noisy. So the question becomes: how do you decide what the model should always know vs what it should retrieve on demand?
The Synapse Approach: Base Context + RAG for the Long Tail
This is the architecture I built for Synapse, which I call Hydration V2:
Base Context: A budget-aware prompt (~30k tokens) that always includes the most important entities. I use the graph structure, specifically node degree (how many connections an entity has), to find the "hubs" of her life. Elena (mom), Noa (partner), Marco (tech lead). These always go in.
RAG for Long Tail: Similarity search only kicks in for specific details that don't fit in the base context. And here's the trick: I track exactly which facts are already in the base prompt.
# The metadata contract. Cortex sends this on every request
{
"compilationMetadata": {
"is_partial": true,
"included_node_ids": ["uuid-elena", "uuid-noa", "uuid-marco"],
"included_edge_ids": ["uuid-works-at", "uuid-diagnosed-with", ...]
}
}
When RAG retrieves results, I cross-reference against this list and drop any facts already in context. No duplication. No wasted tokens.
Why This Only Works with Co-located Semantics
Here's the thing: this metadata contract requires that nodes and edges live in the same store with shared IDs. I go from "Elena has high degree" to "here are Elena's facts" in one database query.
With Mem0's split architecture, this is impossible. The graph knows Elena is important (she has many connections). But Elena's actual facts live in the vector store under different IDs. There's no direct link between the graph entity "elena" and the vector memories about Elena. You'd need to search the vector store by text similarity to find Elena-related facts. Which is exactly the context blindness problem you're trying to avoid.
Could you build a mapping table between vector IDs and graph entities? Sure. But at that point you're building a co-location layer on top of a split architecture. You're rebuilding what Graphiti gives you for free.
The Benchmark: What I Actually Tested
I built a 3-phase benchmark using a fictional user profile (Demy) with complex life situations: an ASD diagnosis, workplace dynamics, BJJ training, family trauma, and relationship changes.
Important caveat: Synapse doesn't use advanced graph features like BFS traversal or multi-hop queries. It does hybrid search: BM25 + cosine similarity + RRF reranking. So this benchmark doesn't test "graph retrieval" in the academic sense. It tests something more practical: what you gain or lose in retrieval quality when semantic context and graph entities live together vs apart.
The Setup
Both systems got the exact same data, same LLM (gpt-4.1-mini), same embedding model (text-embedding-3-small). Graphiti searched with the same SearchConfig that Cortex uses in production (edge + node hybrid with RRF). Mem0 searched with both vector memories AND graph relations in parallel.
Every phase ended with a gemini-3-flash-preview assessment that scored both systems on relevant dimensions (1-5 scale).
The full benchmark is open source. You can run it yourself.
Phase 1: Knowledge Extraction
Four conversations ingested: an ASD Level 1 diagnosis, workplace feedback from tech lead Marco, a BJJ blue belt promotion, and childhood memories with mother Elena.
Then I ran 5 knowledge probes: factual, relational, event-based, emotional, and workplace queries.
Phase 2: Contradiction Handling
Six facts changed: new job (Maplewood startup), belt upgrade (blue to purple), gym switch (Roots MMA to Iron Flow), breakup (Noa), role change (backend to full-stack), and new pet (Pixel the cat).
Both systems were probed before and after the updates.
Phase 3: Story Retention
A rich 14-message narrative about a traumatic childhood event called "the forest event." A camping trip with family, sensory overload at a campfire, going nonverbal, the mother's reaction, a fight between parents that led to their divorce, and 20 years of guilt. Sensory triggers. EMDR therapy plans.
This was the hardest test. Can atomic fact extraction preserve a story's connective tissue?
The Results
Cost: Mem0 Wins
No surprise here. Graphiti's richer pipeline costs more.
| Phase | Graphiti | Mem0 | Ratio |
|---|---|---|---|
| Phase 1 (4 sessions) | 34,632 tokens | 25,394 tokens | 1.36x |
| Phase 2 (2 sessions) | 25,601 | 14,532 | 1.76x |
| Phase 3 (1 session, 14 msgs) | 26,900 | 11,936 | 2.25x |
| Total | 87,133 | 51,862 | 1.68x |
The ratio increases with narrative complexity. Phase 3's single story session cost 2.25x more with Graphiti, driven by its entity deduplication pipeline checking each new edge against the entire existing graph.
Phase 1: Knowledge Coverage
| Dimension | Graphiti | Mem0 |
|---|---|---|
| Fact completeness | 5 | 4 |
| Entity relations | 5 | 2 |
| Specificity | 5 | 4 |
| Retrievability | 4 | 3 |
| Overall | 4.75 | 3.25 |
Graphiti won 4 of 5 probes. Its entity summaries added context that Mem0 lacked. Marco's entity node included the specific date of the 1-on-1 and the feedback details, making retrieval sharper.
But two problems showed up in Mem0's results that I didn't expect:
Problem 1: Top-K crowding. When I asked "What feedback did Marco give Demy?", Mem0's vector search returned childhood memories about Elena alongside the Marco results. The emotional weight of those embeddings dominated the similarity rankings and pushed relevant results down. The graph relations were even worse, returning elena β enrolled β demy and elena β is_mom_of β demy for a workplace query.
Problem 2: Graph retrieval noise. Mem0's graph search returns structural neighbors without semantic awareness. It doesn't know that Elena triples are irrelevant to a Marco query. It just returns whatever is connected. This happened in 3 of 5 probes.
Phase 2: Contradictions (The Split-Brain Problem)
| Dimension | Graphiti | Mem0 |
|---|---|---|
| Temporal handling | 5 | 2 |
| Current fact retrieval | 4 | 3 |
| Additive facts | 5 | 5 |
| Historical awareness | 5 | 2 |
| Overall | 4.75 | 3.0 |
Graphiti's temporal invalidation worked as expected. When I searched for "Who is Demy's partner?" after the breakup, the old Noa edge appeared clearly marked:
User is processing their neurodivergent experience
with the support of Noa. [OUTDATED]
An LLM reading this knows Noa is history, not present. It can say "I remember Noa" without confusing past and present.
Mem0 had a different problem. After the purple belt update, both facts appeared as equally current:
- Got a purple belt last month in martial arts
- Got promoted to blue belt at Roots MMA
No way to tell which is current. Both just exist side by side.
But the most interesting finding was the split-brain. When Demy switched gyms from Roots MMA to Iron Flow:
- Mem0's graph correctly updated:
demy β trains_at β iron_flow_gym - Mem0's vector store still prominently featured: "Feels physically exhausted but mentally regulated after training" (from the Roots MMA era)
The two independent stores drifted out of sync. The graph knew one thing, the vectors said another. This is an architectural consequence, not a bug. The two stores process the same messages independently with no cross-referencing.
Both systems handled purely additive facts well. Pixel the cat was correctly stored by both. Graphiti even caught a secondary effect: the improved relationship with Rodrigo who helped pick out the cat.
Phase 3: Story Retention (The Surprise)
This is where it got interesting. I expected Graphiti to dominate again. It didn't.
Graphiti extracted 16 story-related edges. Clean entity connections:
- [Elena -> Tomas] Elena and Tomas had a major fight
during the camping trip, leading to...
- [Tomas -> Elena] Tomas was married to Elena until
their separation about a year after the forest event [OUTDATED]
- [User -> Dr. Vega] User is being treated by Dr. Vega
who helps them understand the forest event trauma
Mem0 extracted 12 story-related memories. Different kind of detail:
- Has sensory triggers related to the event: smell of
wood smoke, sound of running water, someone screaming a name
- Carried guilt for nearly 20 years believing the event
caused parents' separation
- Experienced sensory overload on the second night due to
noise, smoke, and flickering light
- Experienced a recent trigger in a park when someone
yelled a name loudly, causing them to freeze
The pattern was clear:
- Graphiti captured the causal structure: who did what to whom, what led to what, entity connections. The skeleton of the story.
- Mem0 captured the lived experience: sensory triggers, emotional weight, the 20-year guilt, the specific park incident. The flesh of the story.
When I asked "What are Demy's sensory triggers?", Graphiti returned generic references to the forest event. Mem0 returned the exact three triggers: wood smoke, running water, someone screaming a name.
When I asked "Why did Demy's parents separate?", Graphiti returned the direct causal chain: fight during camping β separation a year later. Mem0 returned the emotional aftermath but with weaker causation.
For a companion that needs to both understand the story structure AND respond with emotional awareness, neither system alone was complete.
What I Learned
Mem0 is genuinely good at vector retrieval
Looking at the data fairly, Mem0's atomic fact extraction produces high-quality, well-crafted memories. "Feels anger about not knowing earlier, which might have prevented burnout." "At a cousin's birthday party, hide in the bathroom for 45 minutes due to the loud noise." These are clean, specific, and individually useful.
For a standard RAG pipeline (similarity search against a query, inject top results), Mem0's memories are arguably better optimized than Graphiti's edge facts, which are structured around entity pairs rather than standalone readability.
But vector retrieval alone creates blind spots
The top-k crowding problem is real. When all your memories are independent vectors with no structural awareness, emotionally heavy content dominates similarity rankings. Childhood trauma bleeds into workplace queries. The system has no way to say "these facts are about Elena, those are about Marco" without relying entirely on embedding distance.
This is what I mean by context blindness. The LLM only sees what similarity search surfaces. And similarity search doesn't understand life categories. It understands embedding proximity.
Co-located semantics are the key differentiator
The practical advantage of Graphiti isn't graph traversal (I don't use it). It's that entities and their facts live together. This enables:
- Knowing what matters: node degree tells you Elena is a hub entity
- Getting the full picture: one query returns both the entity summary and all its facts
- Tracking what's in context: the metadata contract prevents duplicate retrieval
With Mem0, you can know Elena is structurally important (the graph tells you). But getting Elena's rich facts requires a separate vector search, and that search might return non-Elena results based on embedding similarity. The two stores don't talk to each other.
The real architecture is base context + selective RAG
After running this benchmark, I'm more convinced than ever: the future of AI memory isn't "retrieve everything via similarity search." It's:
- Pre-load the important stuff: use the graph structure to identify key entities, put their facts in the base context
- Use RAG for the long tail: specific memories, niche details, historical events that don't fit in the budget
- Track what's already in context: so RAG doesn't waste tokens re-retrieving facts the model already knows
This way, the model always has the structural backbone of the user's life. RAG extends it when needed. Latency stays low for the common case. And you avoid the top-k crowding problem because the important entities aren't competing in similarity search. They're already in context.
I won't go into details today, but I feel the code agents do a similar thing with the AGENTS.md always there alongside the tools definition, and then skills search + code discovery
The Verdict
Mem0 is the right choice for most AI agents. If you need a reliable, mutable memory system with great fact extraction and you're doing standard similarity search, Mem0 is simpler, cheaper (40% fewer tokens), and well-maintained. For 90% of agents, the split architecture doesn't matter because you're not building base context from graph structure.
Graphiti is worth the cost for deeply interconnected companions. If you need to build a structural understanding of someone's life, know which entities are central, pre-load their context, track what's already known, and handle temporal evolution, Graphiti's unified architecture pays for itself. The extra tokens buy you co-located semantics that enable strategies Mem0's split stores can't support.
The hidden cost of context blindness isn't in the retrieval scores. It's in the connections the model never makes because the right context wasn't there.
The full benchmark (scripts, seed data, results, and the technical report) is open source. You can run it yourself, swap models, and see if your results match mine.
If you're building memory systems for AI agents, I'd love to hear how you approach this. What's working for you? What breaks at scale?
United States
NORTH AMERICA
Related News
Jeff Bezos Seeking $100 Billion to Buy Manufacturing Companies, 'Transform' Them With AI
9h ago
Firefox Announces Built-In VPN and Other New Features - and Introduces Its New Mascot
9h ago
Can Private Space Companies Replace the ISS Before 2030?
9h ago
Juicier Steaks Soon? The UK Approves Testing of Gene-Edited Cow Feed
9h ago
White House Unveils National AI Policy Framework To Limit State Power
9h ago

