jvinhit//lab

Search posts

Type to search across journal entries.

navigate open esc close

Agent Patterns: ReAct, Reflection & Planning — From One LLM Call to a Production Loop

ReAct, Reflection, and Planning for LLM agents — when to use each, guardrails against runaway loops, and links to tool use and orchestration.

Part 10 (final) of the Building AI Agents series {Phần 10 — cuối}. Previous {Trước}: Function Calling & Tool Use.

Ten posts ago we counted tokens {Mười bài trước ta đếm token}. By Part 9 we wired tools and parsed tool_calls {Đến Phần 9 ta đã nối tool và parse tool_calls}. This final installment closes the loop: how you turn a single LLM call into an agent — a runtime that repeats inference, executes side effects, and decides when to stop {Bài cuối khép vòng: cách biến một LLM call thành agent — runtime lặp inference, thực thi side effect, và quyết định khi nào dừng}.

An agent is not a smarter prompt {Agent không phải prompt thông minh hơn}. It is an orchestration pattern: a loop over the model, tools, and memory with explicit termination rules {Nó là pattern orchestration: vòng lặp qua model, tool, và memory với quy tắc kết thúc rõ ràng}. The three patterns below — ReAct, Reflection, and Planning — cover most production agent architectures you will ship or debug {Ba pattern dưới — ReAct, Reflection, Planning — cover hầu hết kiến trúc agent production bạn sẽ ship hoặc debug}.


Interactive demo: step through three agent loops {Demo tương tác: bước qua ba vòng agent}

The demo walks one canned task — “Find the cheapest flight SFO → NYC and book a Midtown hotel” — through ReAct, Reflection, and Planning {Demo đi một task canned — “Tìm chuyến bay rẻ nhất SFO → NYC và đặt khách sạn Midtown” — qua ReAct, Reflection, và Planning}. Press Run next step to animate each phase; no API keys, no network {Bấm Run next step để animate từng phase; không API key, không network}.

Open the full demo {Mở demo đầy đủ}: /tools/agent-loop-demo/.


From one LLM call to an agent {Từ một LLM call đến agent}

A single-turn completion is request → response {Completion single-turn là request → response}. An agent adds a control loop around that call {Agent thêm control loop bọc call đó}:

┌─────────────────────────────────────────────────────────────────┐
│                        AGENT RUNTIME                            │
│  ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐  │
│  │  Prompt  │ →  │   LLM    │ →  │  Parse   │ →  │  Tools   │  │
│  │  builder │    │  infer   │    │  output  │    │  / env   │  │
│  └──────────┘    └──────────┘    └──────────┘    └────┬─────┘  │
│       ↑                                                │         │
│       └──────── append observations / memory ─────────┘         │
│                                                                 │
│  Terminate when: answer │ max_steps │ budget │ eval pass        │
└─────────────────────────────────────────────────────────────────┘

Every agent loop shares four ingredients {Mọi vòng agent chia sẻ bốn thành phần}:

IngredientRoleSeries reference
ContextWhat the model sees this turnContext Engineering & Memory
ToolsSide effects — search, write, executeFunction Calling & Tool Use
StoppingWhen generation ends per stepStopping Criteria & Output Control
EvaluationDid the run succeed?Evaluating LLMs & Agents

Mental model: The LLM is the CPU; your runtime is the OS — scheduling, I/O, memory, and kill signals {Mental model: LLM là CPU; runtime của bạn là OS — scheduling, I/O, memory, và kill signal}.

Without the loop, tool schemas and memory design are inert {Không có vòng lặp, tool schema và thiết kế memory là vô dụng}. The pattern you choose determines how often you call the model, what each call does, and where failure can compound {Pattern bạn chọn quyết định gọi model bao lâu một lần, mỗi call làm gì, và lỗi cộng dồn ở đâu}.


ReAct: interleaved reasoning and acting {ReAct: reasoning và acting xen kẽ}

ReAct (Reason + Act) alternates explicit reasoning traces with tool invocations {ReAct (Reason + Act) xen kẽ trace reasoning rõ ràng với lời gọi tool}. The canonical text format from Yao et al.:

Thought: I need current flight prices before recommending.
Action: search_flights(origin="SFO", dest="JFK", depart="2026-06-12")
Observation: [{"airline":"Alaska","price":318}, ...]
Thought: Alaska is cheapest. Search hotels next.
Action: search_hotels(location="Midtown NYC", ...)
Observation: [{"name":"Yotel","rate":142}, ...]
...
Answer: Booked Alaska AS418 + Yotel. Total ~$744.

Runtime contract {Runtime contract}

Your orchestrator must enforce three rules {Orchestrator phải enforce ba quy tắc}:

  1. Stop before Observation: — inject real tool output; never let the model hallucinate observations {Dừng trước Observation: — inject output tool thật; không để model bịa observation}.
  2. Parse Action: — map to a registered tool or reject {Parse Action: — map tới tool đã đăng ký hoặc reject}.
  3. Append the full trace to context for the next turn {Append toàn bộ trace vào context cho turn tiếp}.

Part 4 covered stop sequences for exactly this handoff {Phần 4 đã cover stop sequence cho handoff này}:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    stop=["\nObservation:"],  # runtime injects Observation after tool runs
    max_tokens=512,
)

With native function calling (Part 9), the same loop uses tool_calls instead of text parsing — but the Thought → Action → Observation rhythm remains {Với function calling native (Phần 9), cùng vòng lặp dùng tool_calls thay vì parse text — nhưng nhịp Thought → Action → Observation vẫn giữ}.

When ReAct wins {Khi ReAct thắng}

ScenarioWhy ReAct
Unknown tool sequenceNext action depends on prior observation
Live data (APIs, DB, web)Cannot plan all calls upfront
DebuggingText trace is human-readable
Mixed reasoning + I/OModel decides when it has enough evidence

ReAct failure modes {Failure mode ReAct}

  • Loop thrashing — repeated identical searches; fix with dedup hashes or step memory {Loop thrashing — search lặp giống nhau; fix bằng dedup hash hoặc step memory}.
  • Premature Answer — model answers before calling required tools; fix with schema validation or eval gates {Answer sớm — model trả lời trước khi gọi tool bắt buộc; fix bằng schema validation hoặc eval gate}.
  • Context bloat — every trace grows the window; summarize or trim per Part 5 {Context phình — mỗi trace làm window lớn; summarize hoặc trim theo Phần 5}.

Reflection: draft, critique, revise {Reflection: draft, critique, revise}

Reflection (and Reflexion) add a second pass where the model evaluates its own output before returning {Reflection (và Reflexion) thêm pass thứ hai model đánh giá output của chính nó trước khi trả}. No tools required — pure generation loop {Không cần tool — vòng generation thuần}.

Pass 1 (Draft):     "United $342 + Hampton $189/n ≈ $909."
Pass 2 (Critique):  "Missed Alaska $318. Hotel over budget. No booking confirmation."
Pass 3 (Revised):   "Alaska $318 + Yotel $142/n = ~$744. PNR ABC123, HT-8842."

Implementation sketch {Implementation sketch}

draft = llm("Answer the user query.", user_query)
critique = llm(
    "Review this draft. List factual errors, missing constraints, and quality gaps.",
    draft,
)
if needs_revision(critique):  # heuristic or classifier
    final = llm(
        "Revise using the critique. Return only the corrected answer.",
        draft= draft,
        critique=critique,
    )
else:
    final = draft

Reflexion extends this across episodes: store critiques in long-term memory so future runs avoid repeated mistakes {Reflexion mở rộng qua episode: lưu critique vào long-term memory để run sau tránh lặp lỗi}.

When Reflection wins {Khi Reflection thắng}

ScenarioWhy Reflection
Writing, summarization, code reviewQuality > latency; no live I/O
Constraint-heavy answersSelf-critique catches missed requirements
Cheap model + one retryTwo small calls beat one huge ReAct trace
Post-tool synthesisReAct gathers data; Reflection polishes the answer

Cost trade-off: Reflection adds 1–2 extra LLM calls but often reduces total steps vs. a ReAct agent that wanders {Trade-off chi phí: Reflection thêm 1–2 LLM call nhưng thường giảm tổng bước so với ReAct lang thang}.


Planning: decompose first, execute second {Planning: phân rã trước, thực thi sau}

Plan-and-execute separates planning from execution {Plan-and-execute tách planning khỏi execution}. The planner emits numbered subtasks; workers (same or different models) execute them; a synthesizer merges results {Planner phát subtask đánh số; worker (cùng hoặc khác model) thực thi; synthesizer gộp kết quả}.

Plan:
  1. Search flights SFO→JFK Jun 12–15
  2. Select lowest fare
  3. Search Midtown hotels ≤ $200/n
  4. Book flight + hotel
  5. Return summary with confirmations

Execute 1 → Execute 2 → Execute 3 → Synthesize

Variants {Biến thể}

PatternPlanner outputExecutor behavior
Plan-and-executeNatural-language subtasksLLM + tools per subtask
ReWOOTool call plan upfrontWorkers run tools without intermediate LLM
HierarchicalTree of goalsSub-planners for deep tasks
LATS / tree searchBranching plansExplore multiple paths, prune by eval

ReWOO (Reasoning WithOut Observation in the loop) front-loads all tool calls — fewer LLM round-trips, but brittle when step n+1 depends on step n output {ReWOO front-load mọi tool call — ít round-trip LLM hơn, nhưng dễ gãy khi bước n+1 phụ thuộc output bước n}.

When Planning wins {Khi Planning thắng}

  • Known workflow — onboarding checklists, ETL pipelines, CI steps {Workflow đã biết — checklist onboarding, pipeline ETL, bước CI}.
  • Parallelizable subtasks — independent searches across sources {Subtask song song — search độc lập nhiều nguồn}.
  • Human approval gates — plan is reviewable before execution {Gate phê duyệt — plan review được trước khi chạy}.
  • Cost control — one cheap planner + targeted worker calls {Kiểm soát chi phí — planner rẻ + worker call có mục tiêu}.

Planning failure modes {Failure mode Planning}

  • Stale plan — executor discovers step 2 is impossible; need re-planning loop {Plan cũ — executor thấy bước 2 không khả thi; cần vòng re-planning}.
  • Over-decomposition — 20 micro-steps burn tokens; merge where safe {Phân rã quá — 20 micro-step đốt token; gộp khi an toàn}.
  • Plan hallucination — planner invents tools; validate against registry (Part 9) {Plan bịa tool — validate với registry (Phần 9)}.

Reasoning models and when explicit CoT is redundant {Reasoning model và khi CoT explicit thừa}

Reasoning models (o-series, DeepSeek-R1, QwQ, etc.) internalize chain-of-thought during training {Reasoning model (o-series, DeepSeek-R1, QwQ, v.v.) internalize chain-of-thought khi training}. They emit extended thinking tokens before the visible answer {Chúng phát extended thinking token trước câu trả lời hiển thị}.

ApproachExplicit Thought: scaffoldingReasoning model
ReAct text formatRequired for interpretabilityOften redundant; use tools + final answer
Reflection critiqueStill valuable — external pass catches blind spotsDraft may already be strong; critique remains useful
PlanningPlanner can be a reasoning modelStrong at decomposition; watch cost per plan token
DebuggingHarder — thinking may be hiddenLog reasoning_content if API exposes it

Rule of thumb: If the model already “thinks” internally, don’t double-pay for verbose Thought: prefixes unless you need audit logs or stop-sequence handoffs {Quy tắc ngón tay cái: Nếu model đã “nghĩ” nội bộ, đừng trả double cho prefix Thought: dài trừ khi cần audit log hoặc stop-sequence handoff}.

For model selection trade-offs, see Choosing a Model {Về trade-off chọn model, xem Choosing a Model}. For sampling stability during multi-step loops, see Part 2 {Về ổn định sampling trong vòng multi-step, xem Phần 2}.


Combining patterns in production {Kết hợp pattern trong production}

Real agents rarely use one pattern exclusively {Agent thực tế hiếm khi chỉ dùng một pattern}. Common compositions:

Planning → ReAct (per subtask) → Reflection (final polish)
         ↘ ReWOO (parallel I/O) ↗
LayerPatternExample
Top-level orchestratorPlanning”Research competitor → draft report → send email”
Subtask executorReActLive web search + scrape within one subtask
Output gateReflectionCritique draft before user sees it
Quality assuranceEval (Part 7)Regression suite on golden tasks

Tool use (Part 9) plugs into any pattern at the execution layer {Tool use (Phần 9) cắm vào bất kỳ pattern nào ở tầng execution}. Memory (Part 5) decides what prior plans, critiques, and observations survive the next turn {Memory (Phần 5) quyết định plan, critique, observation trước nào sống sót turn sau}.


Multi-agent orchestration {Multi-agent orchestration}

When a single loop is insufficient, multi-agent architectures assign roles {Khi một vòng lặp không đủ, kiến trúc multi-agent gán role}:

  • Orchestrator + workers — planner delegates to specialized agents (Orchestrator Pattern) {Orchestrator + worker — planner delegate cho agent chuyên biệt (Orchestrator Pattern)}.
  • Pipeline — sequential handoffs with typed outputs {Pipeline — handoff tuần tự với output có kiểu}.
  • Debate / verifier — one agent proposes, another critiques (Reflection as architecture) {Debate / verifier — một agent đề xuất, agent khác critique (Reflection như kiến trúc)}.

For system-level topology — state machines, message buses, shared scratchpads — see Agent Architecture Deep Dive {Về topology cấp hệ thống — state machine, message bus, scratchpad dùng chung — xem Agent Architecture Deep Dive}.

Each sub-agent is still one of the three patterns inside {Mỗi sub-agent bên trong vẫn là một trong ba pattern}. The orchestrator’s job is routing, not replacing ReAct/Reflection/Planning {Việc orchestrator là routing, không thay ReAct/Reflection/Planning}.


Choosing a pattern by task {Chọn pattern theo task}

Task shapeStart hereAvoid
Open-ended research, unknown stepsReActRigid upfront plan
Single-shot quality (email, summary, review)ReflectionUnnecessary tool loops
Repeatable workflow, SOPPlanningReAct wandering
High parallelism, stable tool graphReWOO / PlanningStep-by-step ReAct
Long horizon, branching decisionsHierarchical planning + evalFlat ReAct without budgets
Coding agent in IDEReAct + Reflection on diffPlan-only without file I/O

Ask three questions before shipping {Hỏi ba câu trước khi ship}:

  1. Does the next action depend on live observations? → ReAct {Bước tiếp phụ thuộc observation live? → ReAct}.
  2. Is the output graded on quality, not tool count? → Reflection {Output chấm theo chất lượng, không phải số tool? → Reflection}.
  3. Can you write the steps before running? → Planning {Viết được bước trước khi chạy? → Planning}.

Failure modes and guardrails {Failure mode và guardrail}

Agents fail differently from single-turn LLMs {Agent fail khác LLM single-turn}. Production guardrails from Parts 4, 7, and 9:

FailureSymptomGuardrail
Infinite loopSame Action repeatedmax_steps, action dedup, diminishing returns detector
Hallucinated toolsAction calls unknown functionTool registry whitelist; reject + retry prompt
Runaway costToken budget blown in one runPer-run $ cap, cumulative token counter, step pricing alerts
Context overflowTruncated mid-traceRolling summary, observation compression (Part 5)
Silent wrong answerConfident but incorrectEval harness (Part 7), Reflection gate, human-in-the-loop
Unsafe tool callDestructive write/deletePermission tiers, confirmation UI, sandboxed execution
MAX_STEPS = 12
MAX_TOKENS_PER_RUN = 80_000
seen_actions: set[str] = set()

for step in range(MAX_STEPS):
    response = llm(messages, tools=registry)
    total_tokens += response.usage.total_tokens
    if total_tokens > MAX_TOKENS_PER_RUN:
        raise BudgetExceeded()

    if response.tool_calls:
        action_key = canonicalize(response.tool_calls)
        if action_key in seen_actions:
            messages.append({"role": "user", "content": "Duplicate action. Try a different approach or answer."})
            continue
        seen_actions.add(action_key)
        # execute tools, append observations ...
    else:
        return response.content  # final answer

raise MaxStepsExceeded()

Ship criterion: An agent without max_steps, budget tracking, and eval regression tests is a demo — not production {Tiêu chí ship: Agent không có max_steps, budget tracking, và eval regression test là demo — không phải production}.


Prompt scaffolding reference {Tham chiếu prompt scaffolding}

Minimal system prompts for each pattern (adapt to your stack) {System prompt tối thiểu cho mỗi pattern (adapt stack của bạn)}:

# ReAct
You have tools: {tool_list}. After each Action, stop. The user will provide Observation.
Format: Thought: ... / Action: tool_name(args) / (wait for Observation) / ... / Answer: ...

# Reflection
Pass 1: Produce a complete draft.
Pass 2: Critique the draft — list errors, omissions, constraint violations.
Pass 3: Produce a revised answer addressing every critique point.

# Planning
Pass 1: Output a numbered plan of subtasks. Do not execute yet.
Pass 2+: Execute subtask N. Return structured result.
Final: Synthesize all subtask results into one answer.

Part 3 (Prompt Engineering for Agents) covers role prompts, few-shot exemplars, and output formatting that make these scaffolds reliable {Phần 3 (Prompt Engineering for Agents) cover role prompt, few-shot exemplar, và output format giúp scaffold này tin cậy}.


Series wrap-up: the full stack {Tổng kết loạt bài: full stack}

You now have the complete picture for building LLM agents {Giờ bạn có bức tranh đầy đủ để build LLM agent}:

Tokens & window (1) → Sampling (2) → Prompts (3) → Stopping (4)

Context & memory (5) → RAG vs fine-tune (6) → Eval (7) → Model choice (8)

Tools (9) → Agent patterns: ReAct · Reflection · Planning (10)

Multi-agent orchestration · production guardrails · continuous eval

The through-line: every layer is a contract {Xuyên suốt: mỗi tầng là một contract}. Tokens budget what fits {Token budget cái gì vừa}. Sampling controls variance {Sampling kiểm soát phương sai}. Stopping defines step boundaries {Stopping định nghĩa ranh giới bước}. Context decides what persists {Context quyết định gì persist}. Tools define side effects {Tool định nghĩa side effect}. Patterns define control flow {Pattern định nghĩa control flow}. Eval proves it works under change {Eval chứng minh nó hoạt động khi thay đổi}.

Start with the simplest pattern that fits — usually Reflection for text-only tasks, ReAct when tools are required, Planning when the SOP is known {Bắt đầu với pattern đơn giản nhất phù hợp — thường Reflection cho task text-only, ReAct khi cần tool, Planning khi SOP đã biết}. Add complexity only when evals prove the simpler loop fails {Chỉ thêm phức tạp khi eval chứng minh vòng đơn giản hơn fail}.

Final note: The best agent architecture is the one your team can observe, test, and roll back — not the one with the most pattern names in the README {Ghi chú cuối: Kiến trúc agent tốt nhất là cái team bạn quan sát, test, và rollback được — không phải cái có nhiều tên pattern nhất trong README}.


The Building AI Agents series {Loạt bài Building AI Agents}

  1. Tokens & Context Windows
  2. Sampling: temperature, top_p, top_k
  3. Prompt Engineering for Agents
  4. Stopping Criteria & Output Control
  5. Context Engineering & Memory
  6. Fine-tuning vs Prompting vs RAG
  7. Evaluating LLMs & Agents
  8. Choosing a Model
  9. Function Calling & Tool Use
  10. Agent Patterns: ReAct, Reflection, Planning (current)