Everything in Parts 1-4 describes a pipeline: a fixed sequence of steps executed once per query. Agentic RAG breaks that assumption - the model itself decides whether to retrieve, what to retrieve, evaluates what came back, and either answers or loops again.
The loop diagram above shows the core pattern. The dashed return arrow is the loop-back when more information is needed; the green exit path fires when the evaluation step determines the answer is sufficient.
The Reason → Act → Observe → Evaluate loop
The model reasons about what it needs. It acts - querying a vector database, calling a SQL API, running a web search, or invoking a calculator, depending on what reasoning determined was needed. It observes the result. It evaluates whether it now has sufficient information - and if not, the loop runs again. The "sufficient" exit is the critical design element: without it, you don't have an agent, you have an infinite retry mechanism.
Click to enlarge
Self-correction: Self-RAG and Corrective RAG (CRAG)
A lighter-weight pattern often better as a starting point than full agentic loops: a relevance grader scores retrieved chunks before generation happens. If confidence is low, the system reformulates the query or falls back to web search and retries - before the LLM ever generates a potentially-hallucinated answer grounded in bad context.
When this earns its complexity
Multi-hop reasoning requiring multiple retrieval steps with different queries at each step. Questions spanning structured, unstructured, and real-time sources where the right combination isn't knowable in advance. Contexts where you can absorb 3-10× the LLM calls of a single-pass system - both latency and cost multiply with every loop iteration.
Non-negotiable minimum architecture: never ship an agentic loop without a hard max-iteration limit, a cost ceiling, and full execution tracing. These aren't hardening steps to add later - they're part of the minimum viable architecture.