How to Think About AI Systems Before You Build One

You have used ChatGPT, Claude, or Copilot. You have seen what they can do. Now you want to understand how to design a system that uses AI — not just call it. This article builds the mental architecture for that.


Who this is for

You have used AI tools. You may have read from-zero-to-building and understand software fundamentals. Now you want to understand the AI layer: how AI systems are structured, how they decide what to do, and why some feel smart while others feel brittle.

What this article is NOT

This is not a coding tutorial. This is a design thinking article — it teaches the mental models that underpin every well-built AI system. The implementation details come later.


Part 1 — What makes a system “agentic”

When you type a question into ChatGPT and get a response, you are using a chatbot. One message in, one message out. The model does not plan, does not use tools, does not remember what happened yesterday.

An agentic system goes further. It can break a goal into sub-tasks, select tools, execute actions, evaluate results, and adjust its approach — all within boundaries set by its designer.1 The difference is not intelligence. It is architecture.

Anthropic, who build Claude, put it clearly: “workflows are systems where LLMs are used inside predefined code paths, whereas agents are systems where the model dynamically directs its own process and tool use.”1

graph LR
    A[Chatbot] -->|add tool use| B[Tool-Augmented LLM]
    B -->|add routing| C[Workflow System]
    C -->|add planning| D[Autonomous Agent]

    style A fill:#e8b84b,color:#fff
    style B fill:#4a9ede,color:#fff
    style C fill:#5cb85c,color:#fff
    style D fill:#9b59b6,color:#fff

This is a spectrum, not a binary.2 Most useful systems sit in the middle — structured enough to be reliable, flexible enough to handle variation. You do not need to build a fully autonomous agent. You need to understand where on this spectrum your system should live.

The first design question

Before building anything: how much autonomy does your system actually need? Start with the simplest architecture that solves the problem. Add complexity only when simpler patterns fail.1


Part 2 — The anatomy of an agentic system

Every agentic system, from a simple customer support bot to a multi-agent research pipeline, is built from the same five components. Think of them as layers:3

graph TD
    subgraph The Five Layers
    A[Instructions - who the agent is and what rules it follows]
    B[Routing - how the agent decides what to do with a request]
    C[Tools - what the agent can interact with beyond text]
    D[Knowledge - what the agent knows or can look up]
    E[Orchestration - how multiple steps and agents coordinate]
    end

    A --> B --> C --> D --> E

    style A fill:#e8b84b,color:#fff
    style B fill:#4a9ede,color:#fff
    style C fill:#5cb85c,color:#fff
    style D fill:#9b59b6,color:#fff
    style E fill:#e74c3c,color:#fff
LayerWhat it answersExample
InstructionsWho am I? What are my rules?A system prompt defining persona and constraints
RoutingWhat kind of request is this?An intent classifier that sends billing questions to one handler and technical questions to another
ToolsWhat can I do beyond generating text?API calls, database queries, code execution, file operations
KnowledgeWhat do I know? What can I look up?A vector database, a knowledge graph, retrieved documents
OrchestrationHow do multiple steps fit together?A pipeline that chains retrieval, reasoning, and output formatting

These layers exist in every system. The difference between a brittle prototype and a reliable production system is usually how explicitly each layer is defined.


Part 3 — Division of purpose

When you organise an agentic system into files and folders, you are making architectural decisions. Each folder should correspond to a distinct type of concern. In practice, production systems converge on a pattern:4

graph TD
    ROOT[Project Root] --> INST[Instructions - identity and rules]
    ROOT --> PLAY[Playbooks - step-by-step behaviors]
    ROOT --> TMPL[Templates - output shapes and formats]
    ROOT --> TOOLS[Tools - external integrations and APIs]
    ROOT --> KNOW[Knowledge - reference data and documents]

    style ROOT fill:#4a9ede,color:#fff
    style INST fill:#e8b84b,color:#fff
    style PLAY fill:#5cb85c,color:#fff
    style TMPL fill:#9b59b6,color:#fff
    style TOOLS fill:#e74c3c,color:#fff
    style KNOW fill:#3498db,color:#fff
FolderPurposeChanges when…
InstructionsDefine who the agent is, what rules it follows, what it must never doThe agent’s identity or constraints change
PlaybooksDefine behavior — step-by-step procedures for specific tasksA new task type is added or an existing process changes
TemplatesDefine output shape — the format and structure of what the agent producesThe output requirements change
ToolsDefine capabilities — APIs, database connectors, code execution environmentsNew integrations are needed
KnowledgeStore reference material — documents, data, structured knowledgeThe domain knowledge evolves

This separation matters because each concern changes at a different rate and for different reasons. Instructions are stable. Templates evolve with design. Knowledge updates constantly. When concerns are mixed into a single file, a small change to one ripples unpredictably through the others.

In plain terms

Think of it like organising a restaurant. The menu (template) describes what customers see. The recipes (playbooks) tell the kitchen how to cook each dish. The supplier contacts (tools) connect to external services. The ingredient inventory (knowledge) is what you have on hand. And the house rules (instructions) define how the restaurant operates. You would never put all of these in one document.


Part 4 — The entry point and routing

An agentic system needs to know where to start and where to go for any given request. This is the entry point problem.

The entry point is a single file (or prompt) that the agent reads first. It provides the navigation blueprint — an overview of what the system contains and how to find things. From there, routing directs the request to the correct handler based on intent.5

graph TD
    USER[User Request] --> EP[Entry Point]
    EP --> CL[Intent Classifier]
    CL -->|billing| H1[Billing Handler]
    CL -->|technical| H2[Technical Handler]
    CL -->|general| H3[General Handler]
    CL -->|unclear| H4[Clarification]

    style EP fill:#4a9ede,color:#fff
    style CL fill:#e8b84b,color:#fff
    style H1 fill:#5cb85c,color:#fff
    style H2 fill:#5cb85c,color:#fff
    style H3 fill:#5cb85c,color:#fff
    style H4 fill:#9b59b6,color:#fff

Routing can be implemented at three levels of sophistication:

ApproachHow it worksBest for
Rule-basedKeyword matching, regex patternsSmall systems with clear categories
SemanticEmbedding similarity to route descriptionsMedium systems with fuzzy boundaries
LLM-basedA classifier model returns structured intentComplex systems with overlapping categories

Production systems often stack all three: fast keyword rules handle obvious cases, semantic routing catches fuzzy matches, and an LLM classifier acts as the fallback for ambiguous requests.5 This tiered approach balances speed, accuracy, and cost.

The routing principle

A routing decision is separate from the work itself. The router does not answer the question — it decides who answers the question. This separation means each downstream handler can be narrow, focused, and excellent at its specific task.


Part 5 — Cascading context

Once the system knows where to route a request, it needs to assemble the right instructions. The naive approach is a single massive prompt that contains everything. This breaks as the system grows — the prompt becomes unmanageable, contradictions creep in, and the model loses focus in a sea of tokens.

Context cascading is the alternative. Instructions are organised in layers, from broad to specific, and loaded in sequence:6

graph TD
    L1[Layer 1 - Global Rules] -->|constrains| L2[Layer 2 - Domain Context]
    L2 -->|constrains| L3[Layer 3 - Task Instructions]
    L3 -->|constrains| L4[Layer 4 - Output Template]

    L1 -.->|identity, safety, style| L1
    L2 -.->|architecture, capabilities| L2
    L3 -.->|step-by-step procedure| L3
    L4 -.->|format, structure, schema| L4

    style L1 fill:#e8b84b,color:#fff
    style L2 fill:#4a9ede,color:#fff
    style L3 fill:#5cb85c,color:#fff
    style L4 fill:#9b59b6,color:#fff

Each layer narrows the scope of the next. Layer 1 says “you are a helpful assistant that never reveals confidential data.” Layer 2 says “you are working within a customer support system with access to the order database.” Layer 3 says “the user wants a refund — follow this procedure.” Layer 4 says “format the response as a structured email with these fields.”

The order matters. Research shows that instructions placed early in context have stronger influence on model behavior, and that layered, progressive context outperforms monolithic prompts for complex tasks.6

Why cascading beats monolithic prompts

A single giant prompt is like giving someone a 50-page manual before they start work. Cascading context is like an onboarding process: first the company values, then the department role, then today’s specific assignment. Each layer is self-contained, independently maintainable, and version-controllable.


Part 6 — Structuring knowledge

An agentic system is only as good as the knowledge it can access. This is where knowledge-engineering enters: the discipline of structuring knowledge so machines can use it.

Three concepts matter here:

Knowledge graphs

A knowledge graph represents knowledge as nodes (things) and edges (relationships between things). Unlike a flat database table, a graph captures how concepts relate to each other — enabling multi-hop reasoning, dependency resolution, and contextual retrieval.7

graph LR
    A[Concept A] -->|requires| B[Concept B]
    A -->|relates to| C[Concept C]
    B -->|parent of| D[Concept D]
    C -->|parent of| E[Concept E]
    D -.->|related| E

    style A fill:#4a9ede,color:#fff

Graphs are organised into taxonomies — hierarchical classification systems where each level gets more specific (domain, discipline, topic, concept). And topological sorting can walk a graph to produce a valid order, ensuring prerequisites come before the things that depend on them.

Machine-readable formats

Humans read prose. Machines read structure. Machine-readable formats like JSON, YAML, and XML bridge the gap — they encode knowledge in predictable structures that software can parse without ambiguity.8

Any system that combines human-authored content with machine processing will maintain both forms: rich prose for people, and structured metadata for automation. The two must stay in sync.

Retrieval-Augmented Generation

RAG gives an LLM access to external knowledge at query time instead of relying on training data. The pattern: retrieve relevant documents from a knowledge source, augment the prompt with those documents, then generate a grounded answer.9

graph LR
    Q[Question] --> R[Retrieve]
    R -->|search| KB[Knowledge Base]
    KB -->|relevant docs| A[Augment Prompt]
    A --> G[Generate Answer]

    style R fill:#4a9ede,color:#fff
    style A fill:#5cb85c,color:#fff
    style G fill:#9b59b6,color:#fff

RAG reduces hallucination because the model answers from evidence, not from vague memorisation. It also means the knowledge can be updated instantly — no retraining required.

The knowledge design principle

Well-structured knowledge is the single biggest lever for agent reliability. Ontology-grounded systems show dramatically lower hallucination rates than unstructured ones.7 Invest in your knowledge architecture before investing in fancier models.


Part 7 — Pipelines and orchestration

Complex tasks cannot be handled in a single LLM call. They are decomposed into llm-pipelines — sequences of focused stages where each stage transforms data and passes it forward.1

Anthropic identifies five core pipeline patterns:1

PatternHow it worksWhen to use
Prompt chainingSequential stages, each consuming the prior outputTasks with clear step-by-step dependencies
RoutingClassify input and direct to specialised handlersSystems handling multiple request types
ParallelisationRun independent subtasks simultaneously, then mergeTasks with separable components
Orchestrator-workerA supervisor decomposes tasks and delegates to workersUnpredictable or open-ended problems
Evaluator-optimiserGenerate, evaluate, refine in a loopTasks requiring iterative quality improvement

Orchestration sits above pipelines. It decides what runs, when, in what order, and what happens when something fails.10 Think of it as the conductor of an orchestra: the musicians (agents) each play their part, but the conductor coordinates timing, dynamics, and recovery.

graph TD
    O[Orchestrator] --> P1[Pipeline 1]
    O --> P2[Pipeline 2]
    P1 --> G{Quality Gate}
    P2 --> G
    G -->|pass| MERGE[Merge Results]
    G -->|fail| O

    style O fill:#4a9ede,color:#fff
    style G fill:#e8b84b,color:#fff
    style MERGE fill:#5cb85c,color:#fff

Each playbook is a program — a structured document with triggers, steps, quality checks, and defined outputs. The playbook is to an LLM what source code is to a compiler: unambiguous instructions that produce predictable results. When playbooks are version-controlled and routed to automatically, the system becomes reproducible across sessions.


Part 8 — Humans stay in the loop

Full automation sounds appealing until something goes wrong. The solution is not less automation but smarter automation: human-in-the-loop checkpoints at the moments where human judgement adds the most value.11

The autonomy spectrum runs from conservative to aggressive:

LevelDescriptionWhen to use
AI suggests, human decidesAI produces options; human makes the callHigh stakes (clinical, legal, financial)
AI acts, human approvesAI proposes; human reviews before executionMedium stakes, irreversible actions
AI acts, human auditsAI executes autonomously; human samples post-hocLow-risk, high-volume routine work
AI acts autonomouslyAI runs within constrained scope, no human reviewWell-understood tasks with strong guardrails

Where to place checkpoints:

  • Before irreversible actions — sending an email, executing a financial transaction, deleting data
  • At quality gates — after drafting, before publishing
  • When confidence is low — the system routes uncertain cases to humans instead of guessing
  • At domain boundaries — when a request crosses from one specialist area to another

The best systems follow dynamic load shifting: AI handles bulk work early (research, drafting, structuring), humans concentrate effort late (review, approval, quality judgement). The human does not do more work — they do different work.

The design rule

Start conservative on autonomy and expand as trust, monitoring, and guardrails mature. Measure error rates, false positives, and time-to-human-remediation. Let metrics drive the shift.1


What you now understand

Mental models you have gained

  • The autonomy spectrum — agentic systems range from simple chatbots to autonomous agents; choose the level your problem actually requires
  • Five layers — instructions, routing, tools, knowledge, and orchestration are the components of every agentic system
  • Division of purpose — separate playbooks (behavior), templates (output), tools (capabilities), and knowledge (reference) into distinct concerns
  • Entry points and routing — every system needs a navigation blueprint and intent classification
  • Context cascading — layer instructions from broad to specific instead of dumping everything into one prompt
  • Knowledge architecture — graphs, taxonomies, and RAG are the structures that make agents reliable
  • Pipeline patterns — chaining, routing, parallelisation, orchestrator-worker, and evaluator-optimiser
  • Human-in-the-loop — place checkpoints where the cost of an AI error exceeds the cost of a human review

Check your understanding


Where to go next

I want to build my own agentic system

You understand the patterns — now apply them. Start a project through the learning pipeline: define your intent, and the system will match relevant concepts, resolve prerequisites, and generate a custom learning path.

Best for: People ready to move from understanding to doing.

I want to understand the tech stack underneath

The AI layer sits on top of software fundamentals. If you have not read it yet, from-zero-to-building covers the base layer: frontend, backend, APIs, databases, and the document chain that connects intent to code.

Best for: People who want to understand the full stack, not just the AI layer.

I want to explore the concept cards

Every concept mentioned in this article has its own card with deeper explanations, diagrams, and comprehension questions. Start with agentic-systems or knowledge-graphs and follow the links.

Best for: People who learn by exploring and following connections.


Sources


Further reading

Resources

Footnotes

  1. Schluntz, E. and Zhang, B. (2024). Building Effective Agents. Anthropic. The foundational reference on pipeline patterns, routing, and when to use workflows vs agents. 2 3 4 5 6

  2. Falconer, S. (2026). The Practical Guide to the Levels of AI Agent Autonomy. Medium. Maps agentic systems to SAE-style autonomy tiers.

  3. Anthropic. (2025). Effective Context Engineering for AI Agents. Anthropic. Covers the layered architecture of production agent systems.

  4. LangChain. (2025). Workflows and Agents. LangGraph documentation. Practical implementation patterns for agent file organisation and orchestration graphs.

  5. Google. (2025). Architecting Efficient Context-Aware Multi-Agent Framework for Production. Google Developers Blog. Multi-tier routing patterns combining deterministic rules, semantic routing, and LLM classification. 2

  6. Anthropic. (2025). Effective Context Engineering for AI Agents. Anthropic. Research on why layered, progressive context outperforms monolithic prompts. 2

  7. PubMed. (2025). Ontology-grounded Knowledge Graphs for Mitigating Hallucinations in LLMs for Clinical QA. An ontology-grounded GraphRAG system achieved approximately 1.7% hallucination with 98% accuracy vs baseline LLMs. 2

  8. Medium. (2025). Beyond JSON: Picking the Right Format for LLM Pipelines. Comparison of machine-readable formats for AI systems.

  9. Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020. The original paper introducing RAG and the parametric vs non-parametric knowledge distinction.

  10. Microsoft. (2025). AI Agent Orchestration Patterns. Azure Architecture Center. Comprehensive reference on orchestration design patterns.

  11. Anthropic. (2025). Claude Code Auto Mode: A Safer Way to Skip Permissions. Anthropic. Tiered permission and confidence-based routing for human-in-the-loop decisions.