How to Think About AI Systems Before You Build One
You have used ChatGPT, Claude, or Copilot. You have seen what they can do. Now you want to understand how to design a system that uses AI — not just call it. This article builds the mental architecture for that.
Who this is for
You have used AI tools. You may have read from-zero-to-building and understand software fundamentals. Now you want to understand the AI layer: how AI systems are structured, how they decide what to do, and why some feel smart while others feel brittle.
What this article is NOT
This is not a coding tutorial. This is a design thinking article — it teaches the mental models that underpin every well-built AI system. The implementation details come later.
Part 1 — What makes a system “agentic”
When you type a question into ChatGPT and get a response, you are using a chatbot. One message in, one message out. The model does not plan, does not use tools, does not remember what happened yesterday.
An agentic system goes further. It can break a goal into sub-tasks, select tools, execute actions, evaluate results, and adjust its approach — all within boundaries set by its designer.1 The difference is not intelligence. It is architecture.
Anthropic, who build Claude, put it clearly: “workflows are systems where LLMs are used inside predefined code paths, whereas agents are systems where the model dynamically directs its own process and tool use.”1
graph LR A[Chatbot] -->|add tool use| B[Tool-Augmented LLM] B -->|add routing| C[Workflow System] C -->|add planning| D[Autonomous Agent] style A fill:#e8b84b,color:#fff style B fill:#4a9ede,color:#fff style C fill:#5cb85c,color:#fff style D fill:#9b59b6,color:#fff
This is a spectrum, not a binary.2 Most useful systems sit in the middle — structured enough to be reliable, flexible enough to handle variation. You do not need to build a fully autonomous agent. You need to understand where on this spectrum your system should live.
The first design question
Before building anything: how much autonomy does your system actually need? Start with the simplest architecture that solves the problem. Add complexity only when simpler patterns fail.1
Part 2 — The anatomy of an agentic system
Every agentic system, from a simple customer support bot to a multi-agent research pipeline, is built from the same five components. Think of them as layers:3
graph TD subgraph The Five Layers A[Instructions - who the agent is and what rules it follows] B[Routing - how the agent decides what to do with a request] C[Tools - what the agent can interact with beyond text] D[Knowledge - what the agent knows or can look up] E[Orchestration - how multiple steps and agents coordinate] end A --> B --> C --> D --> E style A fill:#e8b84b,color:#fff style B fill:#4a9ede,color:#fff style C fill:#5cb85c,color:#fff style D fill:#9b59b6,color:#fff style E fill:#e74c3c,color:#fff
| Layer | What it answers | Example |
|---|---|---|
| Instructions | Who am I? What are my rules? | A system prompt defining persona and constraints |
| Routing | What kind of request is this? | An intent classifier that sends billing questions to one handler and technical questions to another |
| Tools | What can I do beyond generating text? | API calls, database queries, code execution, file operations |
| Knowledge | What do I know? What can I look up? | A vector database, a knowledge graph, retrieved documents |
| Orchestration | How do multiple steps fit together? | A pipeline that chains retrieval, reasoning, and output formatting |
These layers exist in every system. The difference between a brittle prototype and a reliable production system is usually how explicitly each layer is defined.
Part 3 — Division of purpose
When you organise an agentic system into files and folders, you are making architectural decisions. Each folder should correspond to a distinct type of concern. In practice, production systems converge on a pattern:4
graph TD ROOT[Project Root] --> INST[Instructions - identity and rules] ROOT --> PLAY[Playbooks - step-by-step behaviors] ROOT --> TMPL[Templates - output shapes and formats] ROOT --> TOOLS[Tools - external integrations and APIs] ROOT --> KNOW[Knowledge - reference data and documents] style ROOT fill:#4a9ede,color:#fff style INST fill:#e8b84b,color:#fff style PLAY fill:#5cb85c,color:#fff style TMPL fill:#9b59b6,color:#fff style TOOLS fill:#e74c3c,color:#fff style KNOW fill:#3498db,color:#fff
| Folder | Purpose | Changes when… |
|---|---|---|
| Instructions | Define who the agent is, what rules it follows, what it must never do | The agent’s identity or constraints change |
| Playbooks | Define behavior — step-by-step procedures for specific tasks | A new task type is added or an existing process changes |
| Templates | Define output shape — the format and structure of what the agent produces | The output requirements change |
| Tools | Define capabilities — APIs, database connectors, code execution environments | New integrations are needed |
| Knowledge | Store reference material — documents, data, structured knowledge | The domain knowledge evolves |
This separation matters because each concern changes at a different rate and for different reasons. Instructions are stable. Templates evolve with design. Knowledge updates constantly. When concerns are mixed into a single file, a small change to one ripples unpredictably through the others.
In plain terms
Think of it like organising a restaurant. The menu (template) describes what customers see. The recipes (playbooks) tell the kitchen how to cook each dish. The supplier contacts (tools) connect to external services. The ingredient inventory (knowledge) is what you have on hand. And the house rules (instructions) define how the restaurant operates. You would never put all of these in one document.
Part 4 — The entry point and routing
An agentic system needs to know where to start and where to go for any given request. This is the entry point problem.
The entry point is a single file (or prompt) that the agent reads first. It provides the navigation blueprint — an overview of what the system contains and how to find things. From there, routing directs the request to the correct handler based on intent.5
graph TD USER[User Request] --> EP[Entry Point] EP --> CL[Intent Classifier] CL -->|billing| H1[Billing Handler] CL -->|technical| H2[Technical Handler] CL -->|general| H3[General Handler] CL -->|unclear| H4[Clarification] style EP fill:#4a9ede,color:#fff style CL fill:#e8b84b,color:#fff style H1 fill:#5cb85c,color:#fff style H2 fill:#5cb85c,color:#fff style H3 fill:#5cb85c,color:#fff style H4 fill:#9b59b6,color:#fff
Routing can be implemented at three levels of sophistication:
| Approach | How it works | Best for |
|---|---|---|
| Rule-based | Keyword matching, regex patterns | Small systems with clear categories |
| Semantic | Embedding similarity to route descriptions | Medium systems with fuzzy boundaries |
| LLM-based | A classifier model returns structured intent | Complex systems with overlapping categories |
Production systems often stack all three: fast keyword rules handle obvious cases, semantic routing catches fuzzy matches, and an LLM classifier acts as the fallback for ambiguous requests.5 This tiered approach balances speed, accuracy, and cost.
The routing principle
A routing decision is separate from the work itself. The router does not answer the question — it decides who answers the question. This separation means each downstream handler can be narrow, focused, and excellent at its specific task.
Part 5 — Cascading context
Once the system knows where to route a request, it needs to assemble the right instructions. The naive approach is a single massive prompt that contains everything. This breaks as the system grows — the prompt becomes unmanageable, contradictions creep in, and the model loses focus in a sea of tokens.
Context cascading is the alternative. Instructions are organised in layers, from broad to specific, and loaded in sequence:6
graph TD L1[Layer 1 - Global Rules] -->|constrains| L2[Layer 2 - Domain Context] L2 -->|constrains| L3[Layer 3 - Task Instructions] L3 -->|constrains| L4[Layer 4 - Output Template] L1 -.->|identity, safety, style| L1 L2 -.->|architecture, capabilities| L2 L3 -.->|step-by-step procedure| L3 L4 -.->|format, structure, schema| L4 style L1 fill:#e8b84b,color:#fff style L2 fill:#4a9ede,color:#fff style L3 fill:#5cb85c,color:#fff style L4 fill:#9b59b6,color:#fff
Each layer narrows the scope of the next. Layer 1 says “you are a helpful assistant that never reveals confidential data.” Layer 2 says “you are working within a customer support system with access to the order database.” Layer 3 says “the user wants a refund — follow this procedure.” Layer 4 says “format the response as a structured email with these fields.”
The order matters. Research shows that instructions placed early in context have stronger influence on model behavior, and that layered, progressive context outperforms monolithic prompts for complex tasks.6
Why cascading beats monolithic prompts
A single giant prompt is like giving someone a 50-page manual before they start work. Cascading context is like an onboarding process: first the company values, then the department role, then today’s specific assignment. Each layer is self-contained, independently maintainable, and version-controllable.
Part 6 — Structuring knowledge
An agentic system is only as good as the knowledge it can access. This is where knowledge-engineering enters: the discipline of structuring knowledge so machines can use it.
Three concepts matter here:
Knowledge graphs
A knowledge graph represents knowledge as nodes (things) and edges (relationships between things). Unlike a flat database table, a graph captures how concepts relate to each other — enabling multi-hop reasoning, dependency resolution, and contextual retrieval.7
graph LR A[Concept A] -->|requires| B[Concept B] A -->|relates to| C[Concept C] B -->|parent of| D[Concept D] C -->|parent of| E[Concept E] D -.->|related| E style A fill:#4a9ede,color:#fff
Graphs are organised into taxonomies — hierarchical classification systems where each level gets more specific (domain, discipline, topic, concept). And topological sorting can walk a graph to produce a valid order, ensuring prerequisites come before the things that depend on them.
Machine-readable formats
Humans read prose. Machines read structure. Machine-readable formats like JSON, YAML, and XML bridge the gap — they encode knowledge in predictable structures that software can parse without ambiguity.8
Any system that combines human-authored content with machine processing will maintain both forms: rich prose for people, and structured metadata for automation. The two must stay in sync.
Retrieval-Augmented Generation
RAG gives an LLM access to external knowledge at query time instead of relying on training data. The pattern: retrieve relevant documents from a knowledge source, augment the prompt with those documents, then generate a grounded answer.9
graph LR Q[Question] --> R[Retrieve] R -->|search| KB[Knowledge Base] KB -->|relevant docs| A[Augment Prompt] A --> G[Generate Answer] style R fill:#4a9ede,color:#fff style A fill:#5cb85c,color:#fff style G fill:#9b59b6,color:#fff
RAG reduces hallucination because the model answers from evidence, not from vague memorisation. It also means the knowledge can be updated instantly — no retraining required.
The knowledge design principle
Well-structured knowledge is the single biggest lever for agent reliability. Ontology-grounded systems show dramatically lower hallucination rates than unstructured ones.7 Invest in your knowledge architecture before investing in fancier models.
Part 7 — Pipelines and orchestration
Complex tasks cannot be handled in a single LLM call. They are decomposed into llm-pipelines — sequences of focused stages where each stage transforms data and passes it forward.1
Anthropic identifies five core pipeline patterns:1
| Pattern | How it works | When to use |
|---|---|---|
| Prompt chaining | Sequential stages, each consuming the prior output | Tasks with clear step-by-step dependencies |
| Routing | Classify input and direct to specialised handlers | Systems handling multiple request types |
| Parallelisation | Run independent subtasks simultaneously, then merge | Tasks with separable components |
| Orchestrator-worker | A supervisor decomposes tasks and delegates to workers | Unpredictable or open-ended problems |
| Evaluator-optimiser | Generate, evaluate, refine in a loop | Tasks requiring iterative quality improvement |
Orchestration sits above pipelines. It decides what runs, when, in what order, and what happens when something fails.10 Think of it as the conductor of an orchestra: the musicians (agents) each play their part, but the conductor coordinates timing, dynamics, and recovery.
graph TD O[Orchestrator] --> P1[Pipeline 1] O --> P2[Pipeline 2] P1 --> G{Quality Gate} P2 --> G G -->|pass| MERGE[Merge Results] G -->|fail| O style O fill:#4a9ede,color:#fff style G fill:#e8b84b,color:#fff style MERGE fill:#5cb85c,color:#fff
Each playbook is a program — a structured document with triggers, steps, quality checks, and defined outputs. The playbook is to an LLM what source code is to a compiler: unambiguous instructions that produce predictable results. When playbooks are version-controlled and routed to automatically, the system becomes reproducible across sessions.
Part 8 — Humans stay in the loop
Full automation sounds appealing until something goes wrong. The solution is not less automation but smarter automation: human-in-the-loop checkpoints at the moments where human judgement adds the most value.11
The autonomy spectrum runs from conservative to aggressive:
| Level | Description | When to use |
|---|---|---|
| AI suggests, human decides | AI produces options; human makes the call | High stakes (clinical, legal, financial) |
| AI acts, human approves | AI proposes; human reviews before execution | Medium stakes, irreversible actions |
| AI acts, human audits | AI executes autonomously; human samples post-hoc | Low-risk, high-volume routine work |
| AI acts autonomously | AI runs within constrained scope, no human review | Well-understood tasks with strong guardrails |
Where to place checkpoints:
- Before irreversible actions — sending an email, executing a financial transaction, deleting data
- At quality gates — after drafting, before publishing
- When confidence is low — the system routes uncertain cases to humans instead of guessing
- At domain boundaries — when a request crosses from one specialist area to another
The best systems follow dynamic load shifting: AI handles bulk work early (research, drafting, structuring), humans concentrate effort late (review, approval, quality judgement). The human does not do more work — they do different work.
The design rule
Start conservative on autonomy and expand as trust, monitoring, and guardrails mature. Measure error rates, false positives, and time-to-human-remediation. Let metrics drive the shift.1
What you now understand
Mental models you have gained
- The autonomy spectrum — agentic systems range from simple chatbots to autonomous agents; choose the level your problem actually requires
- Five layers — instructions, routing, tools, knowledge, and orchestration are the components of every agentic system
- Division of purpose — separate playbooks (behavior), templates (output), tools (capabilities), and knowledge (reference) into distinct concerns
- Entry points and routing — every system needs a navigation blueprint and intent classification
- Context cascading — layer instructions from broad to specific instead of dumping everything into one prompt
- Knowledge architecture — graphs, taxonomies, and RAG are the structures that make agents reliable
- Pipeline patterns — chaining, routing, parallelisation, orchestrator-worker, and evaluator-optimiser
- Human-in-the-loop — place checkpoints where the cost of an AI error exceeds the cost of a human review
Check your understanding
Test yourself before moving on (click to expand)
- Explain the difference between a chatbot, a workflow system, and an autonomous agent. Where on the spectrum would you place a customer support system that handles refunds automatically but escalates complaints to humans?
- Describe the five layers of an agentic system and give a concrete example of each for a system of your choice.
- Distinguish between context cascading and a monolithic system prompt. Why does cascading scale better as the system grows?
- Interpret this scenario: an AI agent with access to tools and a knowledge base keeps producing inconsistent outputs across sessions. Which layer is most likely the problem, and what design principle would you apply to fix it?
- Design a simple agentic system for a use case of your choice. Sketch the folder structure, define 3 routes in the routing table, and identify where you would place human checkpoints.
Where to go next
I want to build my own agentic system
You understand the patterns — now apply them. Start a project through the learning pipeline: define your intent, and the system will match relevant concepts, resolve prerequisites, and generate a custom learning path.
Best for: People ready to move from understanding to doing.
I want to understand the tech stack underneath
The AI layer sits on top of software fundamentals. If you have not read it yet, from-zero-to-building covers the base layer: frontend, backend, APIs, databases, and the document chain that connects intent to code.
Best for: People who want to understand the full stack, not just the AI layer.
I want to explore the concept cards
Every concept mentioned in this article has its own card with deeper explanations, diagrams, and comprehension questions. Start with agentic-systems or knowledge-graphs and follow the links.
Best for: People who learn by exploring and following connections.
Sources
Further reading
Resources
- Building Effective Agents (Anthropic) — The definitive reference on agentic workflow patterns: prompt chaining, routing, parallelisation, orchestrator-workers
- Effective Context Engineering for AI Agents (Anthropic) — How production systems organise layered context, memory, and instructions
- AI Agent Design Patterns (Microsoft Azure) — Production-grade orchestration patterns with architectural diagrams
- Developer’s Guide to Multi-Agent Patterns (Google ADK) — Multi-agent coordination, routing, and handoff patterns from Google
- Graphs Meet AI Agents (arXiv) — Comprehensive survey on how knowledge graphs enable better agent reasoning and retrieval
Footnotes
-
Schluntz, E. and Zhang, B. (2024). Building Effective Agents. Anthropic. The foundational reference on pipeline patterns, routing, and when to use workflows vs agents. ↩ ↩2 ↩3 ↩4 ↩5 ↩6
-
Falconer, S. (2026). The Practical Guide to the Levels of AI Agent Autonomy. Medium. Maps agentic systems to SAE-style autonomy tiers. ↩
-
Anthropic. (2025). Effective Context Engineering for AI Agents. Anthropic. Covers the layered architecture of production agent systems. ↩
-
LangChain. (2025). Workflows and Agents. LangGraph documentation. Practical implementation patterns for agent file organisation and orchestration graphs. ↩
-
Google. (2025). Architecting Efficient Context-Aware Multi-Agent Framework for Production. Google Developers Blog. Multi-tier routing patterns combining deterministic rules, semantic routing, and LLM classification. ↩ ↩2
-
Anthropic. (2025). Effective Context Engineering for AI Agents. Anthropic. Research on why layered, progressive context outperforms monolithic prompts. ↩ ↩2
-
PubMed. (2025). Ontology-grounded Knowledge Graphs for Mitigating Hallucinations in LLMs for Clinical QA. An ontology-grounded GraphRAG system achieved approximately 1.7% hallucination with 98% accuracy vs baseline LLMs. ↩ ↩2
-
Medium. (2025). Beyond JSON: Picking the Right Format for LLM Pipelines. Comparison of machine-readable formats for AI systems. ↩
-
Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020. The original paper introducing RAG and the parametric vs non-parametric knowledge distinction. ↩
-
Microsoft. (2025). AI Agent Orchestration Patterns. Azure Architecture Center. Comprehensive reference on orchestration design patterns. ↩
-
Anthropic. (2025). Claude Code Auto Mode: A Safer Way to Skip Permissions. Anthropic. Tiered permission and confidence-based routing for human-in-the-loop decisions. ↩