Skip to content

PART 1 β€” What Is an AI Agent? ​

Every section follows the same 4-part structure: The Intuition β†’ The Technical Reality β†’ The Production Trap β†’ The Recall Hook


1.1 The Shift: Predictor β†’ Actor ​

The Intuition ​

Old AI = a very smart autocomplete. You ask, it predicts. It has no awareness of the world, no plan, no actions. It's a parrot.

An Agent = a junior employee you give a goal to. They don't need you to tell them every step. They reason, use tools, check results, and adjust β€” until the goal is done.

The shift is from "finish my sentence" to "go do the job."

The Technical Reality ​

An LLM alone is stateless and passive. An Agent adds:

  • A loop around the LLM (Think β†’ Act β†’ Observe β†’ repeat)
  • Tools the LLM can call to interact with the real world
  • Memory so it remembers what it already did
  • Orchestration to manage when to think vs. when to act

Shortest definition you will ever need:

Agent = LLM in a loop + Tools to accomplish an objective

The Production Trap ​

People think "agent" = "just add a prompt." It's not. An agent is a complete application with state management, tool execution, failure handling, and observability. Treat it like building a microservice, not writing a prompt.

Recall Hook ​

Agent = Employee, not Autocomplete. It has a mission, uses tools, and reports back.


1.2 The Anatomy of an Agent ​

+-----------------------------------------------------------+
|                         AI AGENT                          |
|                                                           |
|  MODEL (Brain)      TOOLS (Hands)    ORCHESTRATION        |
|  Reasons/plans      APIs, DBs,       Manages the loop,   |
|  Decides            Code exec,       memory, state,      |
|  Generates          Search, HITL     planning strategy   |
|                                                           |
|  DEPLOYMENT (The Body)                                    |
|  Monitoring  Logging  Scaling  A2A APIs                   |
+-----------------------------------------------------------+

Body Analogy ​

ComponentBody PartWhat it does
ModelBrainReasons, decides, plans
ToolsHandsActs on the world
OrchestrationNervous SystemManages the Think→Act→Observe loop
DeploymentBody + LegsGets the agent out into the world

Each Component in Detail ​

MODEL (The Brain)

  • The LLM. Its quality determines the agent ceiling.
  • Choosing by benchmarks alone = path to failure. Test on your task metrics.
  • Model Routing (production pattern): Use Gemini 2.5 Pro for complex planning β†’ route to Gemini 2.5 Flash for simple summarization. Same agent, 70% lower cost.
  • Agent Ops rule: models are superseded every 6 months. You need a CI/CD pipeline to swap brains without architectural overhaul.

TOOLS (The Hands)

  • Retrieving: RAG (docs), NL2SQL (databases) β€” grounds responses in facts, kills hallucinations
  • Executing: Wrapped APIs (send email, update CRM), code execution in sandboxes
  • Human-in-the-Loop: ask_for_confirmation() β€” agent pauses, human approves, then resumes
  • Tools are exposed via Function Calling. Standards: OpenAPI spec, MCP protocol.
  • Native tools: Gemini has Google Search built-in as a native tool (baked into the LLM call itself).

ORCHESTRATION (The Nervous System)

  • Runs the Thinkβ†’Actβ†’Observe loop
  • Manages how the agent reasons: Chain-of-Thought, ReAct (Reason + Act interleaved)
  • Manages memory: what the agent knows right now (short-term) vs. what it should remember across sessions (long-term)
  • Design choice: No-code builders (fast, limited) vs. code-first frameworks like ADK (full control, production-grade)

DEPLOYMENT (The Body)

  • Not just "put it on a server" β€” monitoring, logging, rate limiting, auth
  • Agents talk to users via GUI or to other agents via A2A protocol

Recall Hook ​

Brain Β· Hands Β· Nervous System Β· Body β€” model, tools, orchestration, deployment.


1.3 The Think β†’ Act β†’ Observe Loop ​

The Intuition ​

Imagine a detective:

  1. Gets the mission (solve the case)
  2. Scans available evidence
  3. Thinks through the strategy
  4. Acts (interviews a suspect, checks records)
  5. Observes the result β€” updates their mental model
  6. Loops until the case is solved

That is exactly the agent loop.

The Technical Reality ​

🎯 1. MISSION β€” User goal arrives
πŸ” 2. SCAN β€” Load context + memory
🧠 3. THINK β€” LLM reasons, makes plan
⚑ 4. ACT β€” Call a tool / API
πŸ‘οΈ 5. OBSERVE β€” Get result, update state β†’ loop back to THINK until done
βœ… 6. REPORT β€” Final answer to user

The orchestration layer manages this loop. Each cycle:

  • The model sees: system prompt + conversation history + tool results so far
  • It decides: keep thinking? call a tool? give final answer?
  • The loop terminates when the agent decides it is done or a max turn limit is hit

ReAct Pattern (most common reasoning strategy):

Thought: "I need to find the halfway point first"
Action: maps_tool(origin="Mountain View", destination="SF")
Observation: "Halfway point is Millbrae"
Thought: "Now I search for coffee in Millbrae"
Action: search_tool(query="good coffee in Millbrae")
Observation: [results...]
Final Answer: "Try Blue Bottle in Millbrae"

The Production Trap ​

Loops can go infinite. Always set max_llm_calls (ADK default: 500). Also: the agent think step costs tokens every iteration. A poorly-scoped task can burn through budget fast.

Recall Hook ​

Mission β†’ Scan β†’ Think β†’ Act β†’ Observe β†’ Loop β€” the detective cycle.


Sources ​

  • Google ADK Whitepaper: Introduction to Agents
  • Google ADK Documentation: ai.google.dev/adk

See something wrong or missing? Edit this page on GitHub β€” reviewed before publishing.

Built from real deployments. Not theory.