How to Think About AI Systems Before You Build One

You have used ChatGPT, Claude, or Copilot. You have seen what they can do. Now you want to understand how to design a system that uses AI — not just call it. This article builds the mental architecture for that.

Who this is for

You have used AI tools. You may have read from-zero-to-building and understand software fundamentals. Now you want to understand the AI layer: how AI systems are structured, how they decide what to do, and why some feel smart while others feel brittle.

What this article is NOT

This is not a coding tutorial. This is a design thinking article — it teaches the mental models that underpin every well-built AI system. The implementation details come later.

Part 1 — What makes a system “agentic”

When you type a question into ChatGPT and get a response, you are using a chatbot. One message in, one message out. The model does not plan, does not use tools, does not remember what happened yesterday.

An agentic system goes further. It can break a goal into sub-tasks, select tools, execute actions, evaluate results, and adjust its approach — all within boundaries set by its designer.¹ The difference is not intelligence. It is architecture.

Anthropic, who build Claude, put it clearly: “workflows are systems where LLMs are used inside predefined code paths, whereas agents are systems where the model dynamically directs its own process and tool use.”¹

graph LR
    A[Chatbot] -->|add tool use| B[Tool-Augmented LLM]
    B -->|add routing| C[Workflow System]
    C -->|add planning| D[Autonomous Agent]

    style A fill:#e8b84b,color:#fff
    style B fill:#4a9ede,color:#fff
    style C fill:#5cb85c,color:#fff
    style D fill:#9b59b6,color:#fff

This is a spectrum, not a binary.² Most useful systems sit in the middle — structured enough to be reliable, flexible enough to handle variation. You do not need to build a fully autonomous agent. You need to understand where on this spectrum your system should live.

The first design question

Before building anything: how much autonomy does your system actually need? Start with the simplest architecture that solves the problem. Add complexity only when simpler patterns fail.¹

Part 2 — The anatomy of an agentic system

Every agentic system, from a simple customer support bot to a multi-agent research pipeline, is built from the same five components. Think of them as layers:³

graph TD
    subgraph The Five Layers
    A[Instructions - who the agent is and what rules it follows]
    B[Routing - how the agent decides what to do with a request]
    C[Tools - what the agent can interact with beyond text]
    D[Knowledge - what the agent knows or can look up]
    E[Orchestration - how multiple steps and agents coordinate]
    end

    A --> B --> C --> D --> E

    style A fill:#e8b84b,color:#fff
    style B fill:#4a9ede,color:#fff
    style C fill:#5cb85c,color:#fff
    style D fill:#9b59b6,color:#fff
    style E fill:#e74c3c,color:#fff

Layer	What it answers	Example
Instructions	Who am I? What are my rules?	A system prompt defining persona and constraints
Routing	What kind of request is this?	An intent classifier that sends billing questions to one handler and technical questions to another
Tools	What can I do beyond generating text?	API calls, database queries, code execution, file operations
Knowledge	What do I know? What can I look up?	A vector database, a knowledge graph, retrieved documents
Orchestration	How do multiple steps fit together?	A pipeline that chains retrieval, reasoning, and output formatting

These layers exist in every system. The difference between a brittle prototype and a reliable production system is usually how explicitly each layer is defined.

Part 3 — Division of purpose

When you organise an agentic system into files and folders, you are making architectural decisions. Each folder should correspond to a distinct type of concern. In practice, production systems converge on a pattern:⁴

graph TD
    ROOT[Project Root] --> INST[Instructions - identity and rules]
    ROOT --> PLAY[Playbooks - step-by-step behaviors]
    ROOT --> TMPL[Templates - output shapes and formats]
    ROOT --> TOOLS[Tools - external integrations and APIs]
    ROOT --> KNOW[Knowledge - reference data and documents]

    style ROOT fill:#4a9ede,color:#fff
    style INST fill:#e8b84b,color:#fff
    style PLAY fill:#5cb85c,color:#fff
    style TMPL fill:#9b59b6,color:#fff
    style TOOLS fill:#e74c3c,color:#fff
    style KNOW fill:#3498db,color:#fff

Folder	Purpose	Changes when…
Instructions	Define who the agent is, what rules it follows, what it must never do	The agent’s identity or constraints change
Playbooks	Define behavior — step-by-step procedures for specific tasks	A new task type is added or an existing process changes
Templates	Define output shape — the format and structure of what the agent produces	The output requirements change
Tools	Define capabilities — APIs, database connectors, code execution environments	New integrations are needed
Knowledge	Store reference material — documents, data, structured knowledge	The domain knowledge evolves

This separation matters because each concern changes at a different rate and for different reasons. Instructions are stable. Templates evolve with design. Knowledge updates constantly. When concerns are mixed into a single file, a small change to one ripples unpredictably through the others.

In plain terms

Think of it like organising a restaurant. The menu (template) describes what customers see. The recipes (playbooks) tell the kitchen how to cook each dish. The supplier contacts (tools) connect to external services. The ingredient inventory (knowledge) is what you have on hand. And the house rules (instructions) define how the restaurant operates. You would never put all of these in one document.

Part 4 — The entry point and routing

An agentic system needs to know where to start and where to go for any given request. This is the entry point problem.

The entry point is a single file (or prompt) that the agent reads first. It provides the navigation blueprint — an overview of what the system contains and how to find things. From there, routing directs the request to the correct handler based on intent.⁵

graph TD
    USER[User Request] --> EP[Entry Point]
    EP --> CL[Intent Classifier]
    CL -->|billing| H1[Billing Handler]
    CL -->|technical| H2[Technical Handler]
    CL -->|general| H3[General Handler]
    CL -->|unclear| H4[Clarification]

    style EP fill:#4a9ede,color:#fff
    style CL fill:#e8b84b,color:#fff
    style H1 fill:#5cb85c,color:#fff
    style H2 fill:#5cb85c,color:#fff
    style H3 fill:#5cb85c,color:#fff
    style H4 fill:#9b59b6,color:#fff

Routing can be implemented at three levels of sophistication:

Approach	How it works	Best for
Rule-based	Keyword matching, regex patterns	Small systems with clear categories
Semantic	Embedding similarity to route descriptions	Medium systems with fuzzy boundaries
LLM-based	A classifier model returns structured intent	Complex systems with overlapping categories

Production systems often stack all three: fast keyword rules handle obvious cases, semantic routing catches fuzzy matches, and an LLM classifier acts as the fallback for ambiguous requests.⁵ This tiered approach balances speed, accuracy, and cost.

The routing principle

A routing decision is separate from the work itself. The router does not answer the question — it decides who answers the question. This separation means each downstream handler can be narrow, focused, and excellent at its specific task.

Part 5 — Cascading context

Once the system knows where to route a request, it needs to assemble the right instructions. The naive approach is a single massive prompt that contains everything. This breaks as the system grows — the prompt becomes unmanageable, contradictions creep in, and the model loses focus in a sea of tokens.

Context cascading is the alternative. Instructions are organised in layers, from broad to specific, and loaded in sequence:⁶

graph TD
    L1[Layer 1 - Global Rules] -->|constrains| L2[Layer 2 - Domain Context]
    L2 -->|constrains| L3[Layer 3 - Task Instructions]
    L3 -->|constrains| L4[Layer 4 - Output Template]

    L1 -.->|identity, safety, style| L1
    L2 -.->|architecture, capabilities| L2
    L3 -.->|step-by-step procedure| L3
    L4 -.->|format, structure, schema| L4

    style L1 fill:#e8b84b,color:#fff
    style L2 fill:#4a9ede,color:#fff
    style L3 fill:#5cb85c,color:#fff
    style L4 fill:#9b59b6,color:#fff

Each layer narrows the scope of the next. Layer 1 says “you are a helpful assistant that never reveals confidential data.” Layer 2 says “you are working within a customer support system with access to the order database.” Layer 3 says “the user wants a refund — follow this procedure.” Layer 4 says “format the response as a structured email with these fields.”

The order matters. Research shows that instructions placed early in context have stronger influence on model behavior, and that layered, progressive context outperforms monolithic prompts for complex tasks.⁶

Why cascading beats monolithic prompts

A single giant prompt is like giving someone a 50-page manual before they start work. Cascading context is like an onboarding process: first the company values, then the department role, then today’s specific assignment. Each layer is self-contained, independently maintainable, and version-controllable.

Part 6 — Structuring knowledge

An agentic system is only as good as the knowledge it can access. This is where knowledge-engineering enters: the discipline of structuring knowledge so machines can use it.

Three concepts matter here:

Knowledge graphs

A knowledge graph represents knowledge as nodes (things) and edges (relationships between things). Unlike a flat database table, a graph captures how concepts relate to each other — enabling multi-hop reasoning, dependency resolution, and contextual retrieval.⁷

graph LR
    A[Concept A] -->|requires| B[Concept B]
    A -->|relates to| C[Concept C]
    B -->|parent of| D[Concept D]
    C -->|parent of| E[Concept E]
    D -.->|related| E

    style A fill:#4a9ede,color:#fff

Graphs are organised into taxonomies — hierarchical classification systems where each level gets more specific (domain, discipline, topic, concept). And topological sorting can walk a graph to produce a valid order, ensuring prerequisites come before the things that depend on them.

Machine-readable formats

Humans read prose. Machines read structure. Machine-readable formats like JSON, YAML, and XML bridge the gap — they encode knowledge in predictable structures that software can parse without ambiguity.⁸

Any system that combines human-authored content with machine processing will maintain both forms: rich prose for people, and structured metadata for automation. The two must stay in sync.

Retrieval-Augmented Generation

RAG gives an LLM access to external knowledge at query time instead of relying on training data. The pattern: retrieve relevant documents from a knowledge source, augment the prompt with those documents, then generate a grounded answer.⁹

graph LR
    Q[Question] --> R[Retrieve]
    R -->|search| KB[Knowledge Base]
    KB -->|relevant docs| A[Augment Prompt]
    A --> G[Generate Answer]

    style R fill:#4a9ede,color:#fff
    style A fill:#5cb85c,color:#fff
    style G fill:#9b59b6,color:#fff

RAG reduces hallucination because the model answers from evidence, not from vague memorisation. It also means the knowledge can be updated instantly — no retraining required.

The knowledge design principle

Well-structured knowledge is the single biggest lever for agent reliability. Ontology-grounded systems show dramatically lower hallucination rates than unstructured ones.⁷ Invest in your knowledge architecture before investing in fancier models.

Part 7 — Pipelines and orchestration

Complex tasks cannot be handled in a single LLM call. They are decomposed into llm-pipelines — sequences of focused stages where each stage transforms data and passes it forward.¹

Anthropic identifies five core pipeline patterns:¹

Pattern	How it works	When to use
Prompt chaining	Sequential stages, each consuming the prior output	Tasks with clear step-by-step dependencies
Routing	Classify input and direct to specialised handlers	Systems handling multiple request types
Parallelisation	Run independent subtasks simultaneously, then merge	Tasks with separable components
Orchestrator-worker	A supervisor decomposes tasks and delegates to workers	Unpredictable or open-ended problems
Evaluator-optimiser	Generate, evaluate, refine in a loop	Tasks requiring iterative quality improvement

Orchestration sits above pipelines. It decides what runs, when, in what order, and what happens when something fails.¹⁰ Think of it as the conductor of an orchestra: the musicians (agents) each play their part, but the conductor coordinates timing, dynamics, and recovery.

graph TD
    O[Orchestrator] --> P1[Pipeline 1]
    O --> P2[Pipeline 2]
    P1 --> G{Quality Gate}
    P2 --> G
    G -->|pass| MERGE[Merge Results]
    G -->|fail| O

    style O fill:#4a9ede,color:#fff
    style G fill:#e8b84b,color:#fff
    style MERGE fill:#5cb85c,color:#fff

Each playbook is a program — a structured document with triggers, steps, quality checks, and defined outputs. The playbook is to an LLM what source code is to a compiler: unambiguous instructions that produce predictable results. When playbooks are version-controlled and routed to automatically, the system becomes reproducible across sessions.

Part 8 — Humans stay in the loop

Full automation sounds appealing until something goes wrong. The solution is not less automation but smarter automation: human-in-the-loop checkpoints at the moments where human judgement adds the most value.¹¹

The autonomy spectrum runs from conservative to aggressive:

Level	Description	When to use
AI suggests, human decides	AI produces options; human makes the call	High stakes (clinical, legal, financial)
AI acts, human approves	AI proposes; human reviews before execution	Medium stakes, irreversible actions
AI acts, human audits	AI executes autonomously; human samples post-hoc	Low-risk, high-volume routine work
AI acts autonomously	AI runs within constrained scope, no human review	Well-understood tasks with strong guardrails

Where to place checkpoints:

Before irreversible actions — sending an email, executing a financial transaction, deleting data
At quality gates — after drafting, before publishing
When confidence is low — the system routes uncertain cases to humans instead of guessing
At domain boundaries — when a request crosses from one specialist area to another

The best systems follow dynamic load shifting: AI handles bulk work early (research, drafting, structuring), humans concentrate effort late (review, approval, quality judgement). The human does not do more work — they do different work.

The design rule

Start conservative on autonomy and expand as trust, monitoring, and guardrails mature. Measure error rates, false positives, and time-to-human-remediation. Let metrics drive the shift.¹

What you now understand

Mental models you have gained

The autonomy spectrum — agentic systems range from simple chatbots to autonomous agents; choose the level your problem actually requires

Five layers — instructions, routing, tools, knowledge, and orchestration are the components of every agentic system

Division of purpose — separate playbooks (behavior), templates (output), tools (capabilities), and knowledge (reference) into distinct concerns

Entry points and routing — every system needs a navigation blueprint and intent classification

Context cascading — layer instructions from broad to specific instead of dumping everything into one prompt

Knowledge architecture — graphs, taxonomies, and RAG are the structures that make agents reliable

Pipeline patterns — chaining, routing, parallelisation, orchestrator-worker, and evaluator-optimiser

Human-in-the-loop — place checkpoints where the cost of an AI error exceeds the cost of a human review

Check your understanding

Test yourself before moving on (click to expand)

Explain the difference between a chatbot, a workflow system, and an autonomous agent. Where on the spectrum would you place a customer support system that handles refunds automatically but escalates complaints to humans?

Describe the five layers of an agentic system and give a concrete example of each for a system of your choice.

Distinguish between context cascading and a monolithic system prompt. Why does cascading scale better as the system grows?

Interpret this scenario: an AI agent with access to tools and a knowledge base keeps producing inconsistent outputs across sessions. Which layer is most likely the problem, and what design principle would you apply to fix it?

Design a simple agentic system for a use case of your choice. Sketch the folder structure, define 3 routes in the routing table, and identify where you would place human checkpoints.

Where to go next

I want to build my own agentic system

You understand the patterns — now apply them. Start a project through the learning pipeline: define your intent, and the system will match relevant concepts, resolve prerequisites, and generate a custom learning path.

Best for: People ready to move from understanding to doing.

I want to understand the tech stack underneath

The AI layer sits on top of software fundamentals. If you have not read it yet, from-zero-to-building covers the base layer: frontend, backend, APIs, databases, and the document chain that connects intent to code.

Best for: People who want to understand the full stack, not just the AI layer.

I want to explore the concept cards

Every concept mentioned in this article has its own card with deeper explanations, diagrams, and comprehension questions. Start with agentic-systems or knowledge-graphs and follow the links.

Best for: People who learn by exploring and following connections.

Explorer

How to Think About AI Systems Before You Build One

How to Think About AI Systems Before You Build One

Who this is for

Part 1 — What makes a system “agentic”

Part 2 — The anatomy of an agentic system

Part 3 — Division of purpose

Part 4 — The entry point and routing

Part 5 — Cascading context

Part 6 — Structuring knowledge

Knowledge graphs

Machine-readable formats

Retrieval-Augmented Generation

Part 7 — Pipelines and orchestration

Part 8 — Humans stay in the loop

What you now understand

Check your understanding

Where to go next

Sources

Further reading

Graph View

Table of Contents

Backlinks

Explorer

How to Think About AI Systems Before You Build One

How to Think About AI Systems Before You Build One

Who this is for

Part 1 — What makes a system “agentic”

Part 2 — The anatomy of an agentic system

Part 3 — Division of purpose

Part 4 — The entry point and routing

Part 5 — Cascading context

Part 6 — Structuring knowledge

Knowledge graphs

Machine-readable formats

Retrieval-Augmented Generation

Part 7 — Pipelines and orchestration

Part 8 — Humans stay in the loop

What you now understand

Check your understanding

Where to go next

Sources

Further reading

Footnotes

Graph View

Table of Contents

Backlinks