PART 4 โ Memory: The Hardest Part of Agents โ
This is where most agent systems fail in production. Memory is not a feature โ it is the foundation.
4.1 The Three Memory Types โ
+------------------------------------------------------------------------+
| MEMORY ARCHITECTURE |
| |
| WORKSPACE MEMORY USER MEMORY GLOBAL MEMORY |
| (Per-conversation) (Per-user) (Shared/Long-term) |
| +------------------+ +-------------+ +----------------+ |
| | Global context | | Background | | Working DB | |
| | Char relations | | Personality | | Learning DB | |
| | Event develop. | | Hobbies | | Entertainment | |
| +------------------+ | Past demand | | ... | |
| +-------------+ +----------------+ |
| ^ |
| Retrieved via RAG |
+------------------------------------------------------------------------+| Memory Type | What it stores | Lifespan | Access method |
|---|---|---|---|
| Workspace/Session | Current conversation context, scratchpad | This conversation only | Direct (session.state) |
| User Memory | Preferences, personality, past interactions | Across sessions, per-user | Lookup by user_id |
| Global Memory | Org-wide knowledge, learned facts, policies | Persistent | RAG / semantic search |
The Workplace Analogy โ
- Workspace Memory = your desktop right now. Temporary, specific to what you're working on.
- User Memory = your manager's notes about you from your last 3 reviews. Persists, personal.
- Global Memory = the company wiki. Everyone can query it. Retrieved when relevant.
4.2 ADK State Scopes โ The Right Tool for Each Memory Type โ
In ADK, session.state is a single dictionary โ but prefixes control scope and persistence:
# NO PREFIX โ lives only in this session (conversation)
session.state['current_intent'] = 'book_flight'
# --> Equivalent to: Workspace Memory (temp scratchpad)
# user: prefix โ persists per user across all sessions
session.state['user:preferred_language'] = 'fr'
# --> Equivalent to: User Memory
# app: prefix โ shared across ALL users of this app
session.state['app:global_discount_code'] = 'SAVE10'
# --> Equivalent to: Shared config/feature flags
# temp: prefix โ lives only within THIS invocation (not even across turns)
session.state['temp:raw_api_response'] = {}
# --> Throwaway scratch space โ never persistsStorage Backends โ
| Service | Persistence | Use When |
|---|---|---|
InMemorySessionService | None (lost on restart) | Development/testing |
DatabaseSessionService | Yes (your Postgres/MySQL) | Self-managed production |
VertexAiSessionService | Yes (Google-managed, scalable) | Google Cloud production |
Recall Hook โ
No prefix = this chat ยท user: = this person ยท app: = everyone ยท temp: = this one call
4.3 Mem0 โ Proactive Memory for Production Agents โ
Source: Mem0 Blog โ Proactive Memory in AI Agents: A Developer's Guide (May 1, 2026)
Mem0 is a memory orchestration layer that solves a fundamental gap: agents today have excellent retrospective recall ("what did we discuss?") but zero prospective recall ("surface this when the context changes").
The Core Insight: Prospective vs Retrospective Memory โ
Retrospective (what most agents have): User asks โ agent searches memory โ responds. The user is always the trigger.
Prospective (what Mem0 enables): Context changes โ agent fires relevant memories before the user has to re-explain. Like a colleague who brings up last week's decision when you open the right file, not when you ask.
The cognitive science term is prospective memory โ memory attached to a future context trigger, not a past query. "When I see auth.py, surface the OAuth blocker from last week."
The Research Behind This (2026) โ
ProMem (arXiv:2601.04463 โ Yang et al., Jan 2026) identifies two structural failures in standard memory extraction:
- Feed-forward extraction: Memory is summarized before the agent knows what future tasks will need. Critical specifics get dropped because the extraction pass does not know they will matter later.
- One-off extraction: Single-pass. If something is missed, it is gone. No self-correction.
ProMem's fix: a two-stage self-questioning loop โ Stage 1 extracts obvious facts, Stage 2 asks "What questions could a future user ask that I didn't capture?" then goes back to the transcript to answer those gaps.
PASK (arXiv:2604.08000 โ Xie et al., Apr 2026) addresses the surfacing side. Its central finding:
Most models are good at either intervening OR staying silent โ but not both. Knowing when NOT to fire is as important as knowing when to fire.
PASK proposes a streaming IntentFlow classifier trained to make a binary DEMAND / NO_DEMAND decision on every turn โ reading the current message in context of recent history, not just keyword matching.
Three Proactive Memory Patterns โ
PATTERN 1 โ Session-Start Scan โ
Session opens -> Context probe (time, open file, project name)
-> Mem0.search(query shaped around context signals)
-> Inject into system prompt
-> User speaks -> Agent already has context loadedThe query is not "find all memories" โ it is a precise probe shaped around what you know about right now.
def session_start_scan(memory, user_id, context, limit=3):
hour = datetime.now().hour
time_of_day = "morning" if hour < 12 else "afternoon" if hour < 17 else "evening"
query = "What was this user recently working on or blocked by"
if context.get("project"):
query += f" in project {context['project']}"
if context.get("open_file"):
query += f" related to {context['open_file']}"
query += f"? It is {time_of_day}."
return memory.search(query, filters={"user_id": user_id}, top_k=limit)
# The difference:
# Reactive agent: "Hey, back on auth service. Where were we?" -> searches now
# Proactive agent: ALREADY loaded "blocked on OAuth token refresh in auth.py,
# identified await chain as cause. Project moved to Postgres."Production use: Support agent that already knows the user's tier, last 3 tickets, and current blocker before they say a word. No new infrastructure โ one async search call while the UI loads.
PATTERN 2 โ Context-Trigger Scan โ
User message -> Intent classifier (LLM prompt, OUTSIDE Mem0)
|
DEMAND? --> Mem0.search(targeted query) -> Surface mid-conversation
NO_DEMAND? --> Skip search, continue normallyTrigger signals that indicate DEMAND:
- File reference: user mentions or opens a specific file
- Error detected: "exception", "failed", "broken" appears
- Tool invocation: about to deploy, run tests, do migration
- Topic shift: conversation moves to a clearly different domain
The PASK finding: Use a classifier, not keyword detection. Keywords miss context shifts that lack obvious signals and false-positive on resolved issues.
def detect_intent(chat, message, history):
# Recent history + message -> LLM classifies DEMAND/NO_DEMAND
# Returns: {decision: "DEMAND", query: "database.py pool config"}
# or: {decision: "NO_DEMAND", query: null}
...
# Regex fallback for obvious signals (file extensions, error words)
_DEMAND_PATTERNS = [
r"\b[\w\-]+\.(py|ts|js|sql|yaml)\b", # file reference
r"\b(error|bug|blocker|failing)\b", # error signals
r"\bwhere we left off\b", # continuation signal
]Key insight: Memory search is not free. Firing it on every turn is wrong. The classifier is a gating layer โ you pay for retrieval only when the conversation actually needs it.
PATTERN 3 โ Scheduled Reflection Scan โ
[AFTER SESSION โ async, offline] [NEXT SESSION โ zero LLM cost]
Session ends New session opens
-> Mem0.get_all() -> Mem0.search("[PROACTIVE]")
-> Reflection LLM reasons over -> Inject pre-computed results
all memories (OUTSIDE Mem0) into system prompt
-> Identify unresolved blockers, -> User speaks, agent is ready
pending tasks, forgotten decisions
-> Mem0.add([PROACTIVE] tags)The ProMem self-questioning applied here: the reflection LLM asks:
- "What unresolved blockers should surface at next session start?"
- "What decisions made today should be remembered for next week?"
- "What context would this user need if they come back cold?"
def run_reflection(memory, chat, user_id):
"""Runs after session ends, asynchronously. NOT in the live conversation."""
envelope = memory.get_all(user_id=user_id)
memories = envelope.get("results", []) # Note: paginated envelope
# Reflection LLM pre-computes what to surface next session
memory.add(
[{"role": "system", "content": f"[PROACTIVE] {item}"} for item in items],
user_id=user_id,
metadata={"type": "proactive_hint"}
)
def on_reflection_session_open(memory, user_id):
"""At next session start โ zero LLM call needed."""
return memory.search("[PROACTIVE]", user_id=user_id, limit=3)The cost inversion: Expensive reasoning happens offline. At session start, you just do a cheap lookup of pre-computed answers. The user gets instant context at zero inference cost.
Production Failure Modes โ
| Failure | Cause | Fix |
|---|---|---|
| Noise injection | Session-start scan with too broad a query surfaces irrelevant memories | Use precision probe queries + relevance threshold โ discard below threshold |
| Over-proactivity fatigue | Agent volunteers memories on every turn | Rate-limit: one injection at session start, one per significant context shift |
| Latency at session start | Search round-trip before first response | Run scan async while UI loads โ memories arrive before first message |
| Extraction completeness vs. cost | ProMem self-questioning loop costs more tokens | Gate reflection depth on session length + importance signals |
The Open Problem (2026) โ
The prospective memory primitive โ memory with an attached trigger condition that fires when the condition is met โ does not exist yet as a first-class primitive in any production memory system. Current patterns (tagging pre-computed hints with metadata) approximate it. A formal prospective_memory type is the next frontier.
Recall Hook โ
Retrospective = user triggers ยท Prospective = context triggers ยท Three patterns: Preload โ Gate โ Reflect offline
4.4 ADK Context Types โ Which One Do You Use Where? โ
The framework injects the right type of context depending on where your code runs. ADK chooses for you โ but you need to know each one:
+--------------------------------------------------------------------+
| CONTEXT TYPE QUICK REFERENCE |
| |
| ReadonlyContext -> Dynamic instruction providers |
| CallbackContext -> Guardrails, logging, state mutation |
| ToolContext -> Tool functions (+ auth, memory search) |
| InvocationContext -> Core agent logic (_run_async_impl) |
+--------------------------------------------------------------------+| Context | Read State | Write State | Artifacts | Auth/Memory | Where used |
|---|---|---|---|---|---|
| ReadonlyContext | Yes | No | No | No | Dynamic instructions |
| CallbackContext | Yes | Yes | Yes | No | Guardrail callbacks |
| ToolContext | Yes | Yes | Yes | Yes | Tool functions |
| InvocationContext | Yes | Yes | Yes | Yes | Core agent logic |
InvocationContext (ctx) โ what's inside:
ctx.session # Full conversation (history + state)
ctx.agent # Which agent is running
ctx.invocation_id # Unique ID for this run
ctx.artifact_service # Save/load files
ctx.end_invocation # Set True to force-stop everythingRecall Hook โ
Read-only for instructions ยท Callback for guardrails ยท Tool for actions ยท Invocation for core logic
Sources โ
- Mem0 Blog โ Proactive Memory in AI Agents: A Developer's Guide (May 1, 2026)
- ProMem: arXiv:2601.04463 โ "Beyond Static Summarization: Proactive Memory Extraction for LLM Agents" (Yang et al., Jan 2026)
- PASK: arXiv:2604.08000 โ "Toward Intent-Aware Proactive Agents with Long-Term Memory" (Xie et al., Apr 2026)
- Google ADK Documentation: Session State, Context Types
Implemented proactive memory in production? Share what worked โ real patterns from real systems welcome.