Multi-Agent Systems
Architectures where multiple AI agents — each with a specialised role — collaborate to accomplish tasks that exceed what any single agent could handle alone.
What is it?
A single AI agent can reason, use tools, and follow a plan. But some tasks are too broad, too complex, or too time-consuming for one agent working alone. A multi-agent system solves this by splitting work across multiple specialised agents that collaborate — each focusing on a narrow part of the problem while a coordination mechanism ensures their efforts combine into a coherent result.1
The idea is not new. Human organisations have always worked this way: a hospital has specialists who each handle one aspect of a patient’s care, a newsroom has reporters, editors, and fact-checkers working in parallel, and a software company has designers, engineers, and testers with distinct responsibilities. Multi-agent AI systems apply the same principle of division of labour to AI agents.2
The parent concept, agentic-systems, introduces multi-agent coordination as the highest level of the autonomy spectrum — beyond a single tool-using agent, you reach systems where agents delegate to other agents. orchestration provides the coordination layer, and llm-pipelines provide the internal structure of each agent’s workflow. Multi-agent systems combine both: each agent runs its own pipeline internally, and orchestration manages how agents interact externally.3
A crucial principle from Anthropic’s research: use the simplest architecture that works.3 A single agent with good tool access solves most tasks. Multi-agent systems earn their complexity only when the task genuinely requires parallel exploration, exceeds a single agent’s context window, or demands specialised capabilities that no single prompt can cover. Google’s research shows 39—70% performance degradation when teams add agents beyond what a task requires.4
In plain terms
A single agent is like one person doing all the work on a project. A multi-agent system is like a team — one person researches, another writes, a third reviews. Each person is a specialist, and someone coordinates so the pieces fit together. The team finishes faster and produces better work than any one person could alone, but only if the project is big enough to justify the coordination overhead.
At a glance
Four architectural patterns (click to expand)
graph TD subgraph Supervisor/Worker S1[Supervisor] --> W1[Worker A] S1 --> W2[Worker B] S1 --> W3[Worker C] W1 --> S1 W2 --> S1 W3 --> S1 end subgraph Swarm/Peer-to-Peer A1[Agent A] --> A2[Agent B] A2 --> A3[Agent C] A3 --> A1 end subgraph Hierarchical H1[Manager] --> H2[Supervisor X] H1 --> H3[Supervisor Y] H2 --> H4[Worker 1] H2 --> H5[Worker 2] H3 --> H6[Worker 3] end subgraph Sequential Handoff SQ1[Agent 1] --> SQ2[Agent 2] SQ2 --> SQ3[Agent 3] SQ3 --> SQ4[Result] endKey: Four core patterns for organising multiple agents. Supervisor/Worker centralises control in one coordinator. Swarm lets agents self-organise through handoffs. Hierarchical creates layers of delegation. Sequential handoff passes work through a chain of specialists. Most production systems use hybrids of these patterns.
How does it work?
Multi-agent systems are built from a small set of architectural patterns that determine how agents coordinate. Each pattern makes different trade-offs between control, speed, flexibility, and debuggability.5
1. Supervisor/Worker — centralised delegation
The most common production pattern. A single supervisor agent receives a task, decomposes it into subtasks, delegates each to a specialised worker agent, monitors progress, and assembles the final output. Workers do not communicate with each other — all coordination flows through the supervisor.5
The supervisor holds the global context: it knows the overall goal, tracks which subtasks are complete, and decides when the work is done. Workers are narrowly focused — one handles database queries, another writes code, a third calls external APIs. This is Anthropic’s “orchestrator-workers” pattern: the key difference from a fixed pipeline is that the subtasks are not predefined but determined dynamically by the supervisor based on the specific input.3
| Component | Role |
|---|---|
| Supervisor | Decomposes task, delegates, monitors, synthesises |
| Workers | Execute narrow subtasks independently |
| Communication | Hub-and-spoke — all flows through supervisor |
Think of it like...
A film director on set. The director does not operate the camera, act in scenes, or edit footage. They hold the overall vision, decide what each specialist does, review results, and adjust the plan when something is not working. The camera operators, actors, and editors are the workers — each excellent at one thing, coordinated by the director’s decisions.
Example: research orchestration (click to expand)
Anthropic’s own Research feature uses this pattern. When a user submits a complex query, a lead agent analyses it, develops a research strategy, and spawns subagents to explore different aspects simultaneously. Each subagent independently searches, evaluates sources, and returns findings. The lead agent synthesises results and decides whether more research is needed.1
Their internal evaluations showed that a multi-agent system with Claude Opus 4 as the lead agent and Claude Sonnet 4 subagents outperformed a single-agent Claude Opus 4 by 90.2% on research benchmarks. The key advantage: subagents operate in parallel with separate context windows, enabling breadth-first exploration that a single agent cannot match.1
Strengths: Easy to debug (single control flow), clear audit trail, natural error recovery (supervisor retries failed workers).
Weaknesses: Supervisor is a single point of failure and a throughput bottleneck. It must hold all worker results in its context window, which limits scalability for tasks producing many intermediate outputs.5
Concept to explore
See supervisor-worker for a deeper dive into task decomposition strategies, worker design, and scaling considerations.
2. Swarm/Peer-to-peer — emergent coordination
The swarm pattern eliminates centralised control. Agents operate as autonomous peers that make local decisions based on shared state or handoff protocols. There is no supervisor. Coordination emerges from simple local rules applied by many agents simultaneously — the same principle behind ant colonies, bird flocks, and distributed consensus.5
In practice, swarm agents share a blackboard (a shared memory or state store) and use handoff protocols to transfer tasks. Each agent has a set of capabilities and can hand off to another agent when it encounters work outside its specialisation. The active agent responds directly to the user or the next agent — no intermediary reprocesses the output.6
LangChain’s benchmarks show that swarm patterns use roughly 40% fewer tokens than supervisor patterns and achieve faster end-to-end response times, because they eliminate the “translation overhead” of a supervisor reprocessing every worker’s output.4 However, the supervisor provides stronger guarantees for reliability, error recovery, and auditability.
Think of it like...
A jazz ensemble improvising. No conductor tells each musician when to play. Instead, each musician listens to the others, knows when to take the lead, and hands the spotlight to the next player when the moment feels right. The music emerges from local awareness and mutual response, not from a central plan. The result can be brilliant, but it is harder to predict or debug than an orchestrated piece.
Strengths: No single point of failure, high scalability (no coordination bottleneck), faster for exploratory tasks where the optimal path is unknown.
Weaknesses: Hard to observe and debug (requires distributed tracing), no global arbiter to enforce sequencing, convergence can be unpredictable.5
Concept to explore
See swarm-architecture for handoff protocols, shared state design, and convergence strategies.
3. Hierarchical decomposition — agents as tools
The hierarchical pattern organises agents into a tree structure with multiple levels of delegation. A top-level manager agent delegates to mid-level supervisors, who delegate to leaf-level workers. Each level adds a layer of abstraction: the top level reasons about strategy, mid-levels reason about tactics, and workers execute specific actions.5
This is also expressed as the agents-as-tools pattern: higher-level agents treat lower-level agents as callable tools, invoking them for sub-problems just as they would invoke an API or database query. This creates layered control and modularity — a higher-level agent does not need to know how a sub-agent accomplishes its task, only that it returns a result in the expected format.3
The critical advantage is context window management. No single agent needs to hold the full context of the entire problem. The top-level agent holds the high-level objective and summary results. Mid-level agents hold their team’s context. Workers hold only their specific input. This allows hierarchical systems to tackle problems that would overflow any single agent’s context window.5
Example: enterprise-scale audit (click to expand)
Consider auditing a large codebase across multiple domains. A top-level agent receives the goal “audit this repository for security, performance, and maintainability.” It delegates to three domain supervisors:
Level Agent Responsibility Manager Audit orchestrator Sets criteria, delegates domains, synthesises final report Supervisor Security lead Breaks security audit into auth, injection, encryption checks Supervisor Performance lead Breaks performance audit into query optimisation, caching, profiling Workers File-level agents Each analyses specific files for their assigned concern Each layer compresses results upward. Workers return findings, supervisors synthesise domain reports, and the manager produces the final audit. No single agent needs to hold the entire codebase in context.
Strengths: Scales to large problems (20+ agents), manages context effectively, mirrors organisational structures.
Weaknesses: Latency accumulates at each level (2+ seconds per LLM call per level). Information loss at each summarisation step — a nuanced worker finding may be compressed to a single sentence by the supervisor.5
4. Sequential handoff — the relay race
A chain of specialised agents where each passes control to the next. Agent 1 completes its work and hands off to Agent 2, which completes its work and hands off to Agent 3. Each agent has full control during its turn and passes a structured output to its successor.7
This differs from a simple pipeline in a critical way: each stage is a full agent with autonomous reasoning and tool use, not just a prompted LLM call. The handoff includes context, intermediate results, and sometimes instructions for the next agent. LlamaIndex’s AgentWorkflow implements this pattern with explicit handoff directives that specify which agent receives control and what context it needs.7
For example, a content creation workflow might use:
| Agent | Role | Handoff output |
|---|---|---|
| Research agent | Gathers sources and extracts key facts | Structured fact sheet |
| Drafting agent | Writes the article from the fact sheet | Draft document |
| Review agent | Checks accuracy, style, completeness | Annotated draft with corrections |
| Publishing agent | Formats and publishes the final version | Published article |
Think of it like...
A relay race. Each runner (agent) sprints their leg of the race and passes the baton (context and results) to the next. Each runner is a specialist in their segment. The handoff is the critical moment — a fumbled baton (lost context, poor formatting) costs more time than a slow leg.
Strengths: Predictable execution order, clear stage boundaries, easy to monitor and debug. Each agent can use a model and prompt optimised for its specific role.
Weaknesses: Cannot handle tasks where execution order depends on intermediate results. Total latency is the sum of all stages.5
Why multi-agent? The case for and against
Multi-agent systems solve specific problems that single agents cannot address efficiently:1
- Division of labour. Different parts of a problem require different expertise, tools, and prompts. Specialised agents outperform generalist agents on focused tasks.
- Context window limits. A single agent’s context window is finite. Multi-agent systems distribute information across separate context windows, enabling work on problems that exceed any single agent’s capacity.1
- Parallelism. Independent subtasks run simultaneously, reducing total time from the sum of all tasks to the duration of the slowest one. See parallelisation for the underlying pattern.
- Scalability. Adding a new capability means adding a new agent, not rewriting the entire system.
But multi-agent is not always the answer. Anthropic’s core principle is clear: start with the simplest solution that works, and add complexity only when necessary.3
When NOT to use multi-agent
- The task fits within a single agent’s context window and capability set
- Subtasks have heavy sequential dependencies (multi-agent adds coordination overhead without parallel benefit)
- The cost of coordination exceeds the benefit of specialisation
- A single agent with good tool access already solves the problem reliably
Google’s research demonstrates 39—70% performance degradation when teams add agents beyond what the task requires. Most production failures are coordination problems, not capability problems — adding more agents to a coordination problem makes it worse.4
Inter-agent communication patterns
How agents share information determines the system’s behaviour as much as the agents themselves.8
Shared state (blackboard pattern). All agents read from and write to a common state store. Simple to implement, but risks contention and requires careful concurrency management. Used in swarm architectures where agents discover and respond to each other’s work through the shared state.
Message passing. Agents send structured messages directly to each other or through a message broker. More complex than shared state, but enables loose coupling and asynchronous operation. Used in mesh and hierarchical architectures.
Handoff directives. An agent explicitly transfers control to another agent, passing along context and instructions. The simplest form of inter-agent communication. Used in sequential handoff and swarm patterns.
The shared vs isolated context trade-off
Shared context means all agents see the same information — good for coherence, bad for scalability (context window bloat and attention dilution). Isolated context means each agent has only what it needs — good for focus and scalability, bad for coherence (agents may produce contradictory outputs). Most production systems use a hybrid: agents have isolated working context but share results through a structured aggregation layer.1
Why do we use it?
Key reasons
1. Tasks exceed single-agent capacity. Some problems are too broad for one agent’s context window, too complex for one prompt, or too time-consuming for sequential processing. Multi-agent systems distribute the work across agents that each handle a manageable piece.1
2. Specialisation improves quality. A security-focused agent with a security-optimised prompt outperforms a generalist agent trying to check security, performance, and style in one pass. Each agent can use tailored prompts, tools, and even different models suited to its specific task.3
3. Parallel execution saves time. Anthropic’s multi-agent research system cut research time by up to 90% for complex queries by running subagents in parallel instead of sequentially.1
4. Modularity enables evolution. Adding a new capability means adding a new agent, not rewriting the system. Replacing one agent (upgrading its model, changing its tools) does not affect the others. This mirrors the microservices principle in software architecture.5
When do we use it?
- When the task requires parallel exploration of multiple independent directions (research, due diligence, competitive analysis)
- When the problem exceeds a single context window — too many documents, too much code, too many data sources for one agent to hold
- When different parts of the task require different specialised tools or expertise that cannot be effectively combined in one prompt
- When you need to scale a workflow by adding capacity (more agents) without redesigning the system
- When latency matters and independent subtasks can run simultaneously
Rule of thumb
If a single agent with good tool access can solve the task reliably within its context window, use a single agent. Reach for multi-agent only when the task genuinely requires parallel work, exceeds context limits, or demands specialisation that one prompt cannot cover.3
How can I think about it?
The hospital emergency department
An emergency department is a multi-agent system where every professional has a specialised role.
- The triage nurse = the routing agent. Assesses incoming patients and decides who handles each case. Does not treat patients — classifies and delegates.
- The ER doctor = the supervisor agent. Holds the overall picture for a patient, makes high-level decisions, delegates specific tasks.
- The radiologist = a specialist worker agent. Called when imaging is needed, operates independently with their own tools (scanners), returns structured results (scan reports) to the supervising doctor.
- The lab technician = another specialist worker. Runs blood tests in parallel with the radiologist’s work. Neither waits for the other.
- The patient handoff between shifts = sequential handoff. The outgoing doctor passes context (charts, status, plan) to the incoming doctor, who takes full control.
- The hospital’s shared medical record = the blackboard (shared state). Every specialist reads and writes to the same record, maintaining coherence across the team.
No single doctor handles every aspect of care. The system’s quality comes from specialisation, parallel work, and coordination — the same properties that make multi-agent AI systems effective.
The film production crew
A film set is a multi-agent system with clear hierarchical delegation and parallel execution.
- The director = the top-level supervisor. Holds the creative vision, decomposes each scene into shots, delegates execution.
- The assistant directors = mid-level supervisors. Each manages a domain: one coordinates actors, another manages extras, a third handles logistics.
- Camera operators, sound engineers, lighting technicians = specialist workers. Each has narrow expertise and specialised tools. They work in parallel during each take.
- The script supervisor = the shared state manager. Tracks continuity across takes and scenes so that independently filmed shots will fit together coherently.
- Post-production handoff = sequential handoff. Filmed footage passes from the set to editing, then to sound design, then to colour grading. Each stage is handled by a different specialist team.
- The wrap meeting = the aggregation step. The director reviews all work, identifies gaps, and decides whether reshoots (retries) are needed.
The film’s quality comes from each specialist doing one thing well, the director coordinating the whole, and the production system ensuring nothing falls through the cracks.
Concepts to explore next
| Concept | What it covers | Status |
|---|---|---|
| supervisor-worker | Task decomposition, worker design, and the orchestrator-worker pattern in depth | stub |
| swarm-architecture | Decentralised coordination, handoff protocols, and convergence strategies | stub |
| agent-communication | Message passing, shared state, handoff protocols, and inter-agent coordination | stub |
Some cards don't exist yet
A broken link is a placeholder for future learning, not an error.
Check your understanding
Test yourself (click to expand)
- Explain why a multi-agent system can outperform a single agent on broad research tasks, even when the single agent uses a more capable model.
- Name the four main architectural patterns for multi-agent systems and describe the coordination mechanism each uses.
- Distinguish between the supervisor/worker pattern and the swarm pattern. Under what conditions would you choose one over the other?
- Interpret this scenario: a multi-agent system for code review has five worker agents, but consistently produces contradictory findings (one agent says a function is secure, another says it has vulnerabilities). What is the most likely cause, and how would you fix it?
- Connect multi-agent systems to Anthropic’s principle of using the simplest architecture that works. What criteria would you use to decide whether a task genuinely requires multiple agents?
Where this concept fits
Position in the knowledge graph
graph TD AIML[AI and Machine Learning] --> AS[Agentic Systems] AS --> LLM[LLM Pipelines] AS --> ORCH[Orchestration] AS --> MAS[Multi-Agent Systems] AS --> TU[Tool Use] MAS --> SW[Supervisor-Worker] MAS --> SWARM[Swarm Architecture] MAS --> AC[Agent Communication] style MAS fill:#4a9ede,color:#fffRelated concepts:
- parallelisation — multi-agent systems use parallelisation internally, assigning independent sub-problems to specialised agents that work concurrently
- tool-use — each agent in a multi-agent system uses tools to interact with the world; tool design is as critical as agent design
- agent-memory — agents need memory to retain context across interactions; in multi-agent systems, shared vs isolated memory is a key design decision
- autonomy-spectrum — multi-agent systems sit at the highest end of the spectrum; each agent’s autonomy level must be calibrated to its role
- human-in-the-loop — multi-agent systems often include human checkpoints at critical decision points to maintain oversight and trust
Sources
Further reading
Resources
- How we built our multi-agent research system (Anthropic) — First-hand account of Anthropic’s production multi-agent architecture with benchmarks and engineering lessons
- Building Effective Agents (Anthropic) — The foundational reference on agentic workflow patterns, including the orchestrator-workers pattern and the principle of using the simplest architecture
- Agent Orchestration Patterns: Swarm vs Mesh vs Hierarchical (GuruSup) — Production-focused comparison of five orchestration patterns with a decision framework and comparison matrix
- Multi-Agent Architecture Patterns Guide 2026 (Paperclipped) — Synthesises Google, LangChain, and O’Reilly research into a practical pattern selection framework
- Single-Agent vs Multi-Agent Systems (The Thinking Company) — Clear analysis of when multi-agent adds value versus when a single agent is sufficient
Footnotes
-
Anthropic. (2025). How we built our multi-agent research system. Anthropic Engineering. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8
-
AI Workflow Lab. (2026). How to Build Multi-Agent AI Systems in 2026. AI Workflow Lab. ↩
-
Schluntz, E. and Zhang, B. (2024). Building Effective Agents. Anthropic. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7
-
Paperclipped. (2026). Multi-Agent Architecture Patterns Guide 2026. Paperclipped. ↩ ↩2 ↩3
-
GuruSup. (2026). Agent Orchestration Patterns: Swarm vs Mesh vs Hierarchical. GuruSup. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9 ↩10
-
The Thinking Company. (2026). Single-Agent vs Multi-Agent AI Systems. The Thinking Company. ↩
-
LlamaIndex. (2025). AgentWorkflow: Building Multi-Agent Systems. LlamaIndex Documentation. ↩ ↩2
-
Mullapudi, M. (2026). Supervisor and Hierarchical Multi-Agent Patterns. TutorialQ. ↩
