Multi-Agent Systems

Architectures where multiple AI agents — each with a specialised role — collaborate to accomplish tasks that exceed what any single agent could handle alone.


What is it?

A single AI agent can reason, use tools, and follow a plan. But some tasks are too broad, too complex, or too time-consuming for one agent working alone. A multi-agent system solves this by splitting work across multiple specialised agents that collaborate — each focusing on a narrow part of the problem while a coordination mechanism ensures their efforts combine into a coherent result.1

The idea is not new. Human organisations have always worked this way: a hospital has specialists who each handle one aspect of a patient’s care, a newsroom has reporters, editors, and fact-checkers working in parallel, and a software company has designers, engineers, and testers with distinct responsibilities. Multi-agent AI systems apply the same principle of division of labour to AI agents.2

The parent concept, agentic-systems, introduces multi-agent coordination as the highest level of the autonomy spectrum — beyond a single tool-using agent, you reach systems where agents delegate to other agents. orchestration provides the coordination layer, and llm-pipelines provide the internal structure of each agent’s workflow. Multi-agent systems combine both: each agent runs its own pipeline internally, and orchestration manages how agents interact externally.3

A crucial principle from Anthropic’s research: use the simplest architecture that works.3 A single agent with good tool access solves most tasks. Multi-agent systems earn their complexity only when the task genuinely requires parallel exploration, exceeds a single agent’s context window, or demands specialised capabilities that no single prompt can cover. Google’s research shows 39—70% performance degradation when teams add agents beyond what a task requires.4

In plain terms

A single agent is like one person doing all the work on a project. A multi-agent system is like a team — one person researches, another writes, a third reviews. Each person is a specialist, and someone coordinates so the pieces fit together. The team finishes faster and produces better work than any one person could alone, but only if the project is big enough to justify the coordination overhead.


At a glance


How does it work?

Multi-agent systems are built from a small set of architectural patterns that determine how agents coordinate. Each pattern makes different trade-offs between control, speed, flexibility, and debuggability.5

1. Supervisor/Worker — centralised delegation

The most common production pattern. A single supervisor agent receives a task, decomposes it into subtasks, delegates each to a specialised worker agent, monitors progress, and assembles the final output. Workers do not communicate with each other — all coordination flows through the supervisor.5

The supervisor holds the global context: it knows the overall goal, tracks which subtasks are complete, and decides when the work is done. Workers are narrowly focused — one handles database queries, another writes code, a third calls external APIs. This is Anthropic’s “orchestrator-workers” pattern: the key difference from a fixed pipeline is that the subtasks are not predefined but determined dynamically by the supervisor based on the specific input.3

ComponentRole
SupervisorDecomposes task, delegates, monitors, synthesises
WorkersExecute narrow subtasks independently
CommunicationHub-and-spoke — all flows through supervisor

Think of it like...

A film director on set. The director does not operate the camera, act in scenes, or edit footage. They hold the overall vision, decide what each specialist does, review results, and adjust the plan when something is not working. The camera operators, actors, and editors are the workers — each excellent at one thing, coordinated by the director’s decisions.

Strengths: Easy to debug (single control flow), clear audit trail, natural error recovery (supervisor retries failed workers).

Weaknesses: Supervisor is a single point of failure and a throughput bottleneck. It must hold all worker results in its context window, which limits scalability for tasks producing many intermediate outputs.5

Concept to explore

See supervisor-worker for a deeper dive into task decomposition strategies, worker design, and scaling considerations.


2. Swarm/Peer-to-peer — emergent coordination

The swarm pattern eliminates centralised control. Agents operate as autonomous peers that make local decisions based on shared state or handoff protocols. There is no supervisor. Coordination emerges from simple local rules applied by many agents simultaneously — the same principle behind ant colonies, bird flocks, and distributed consensus.5

In practice, swarm agents share a blackboard (a shared memory or state store) and use handoff protocols to transfer tasks. Each agent has a set of capabilities and can hand off to another agent when it encounters work outside its specialisation. The active agent responds directly to the user or the next agent — no intermediary reprocesses the output.6

LangChain’s benchmarks show that swarm patterns use roughly 40% fewer tokens than supervisor patterns and achieve faster end-to-end response times, because they eliminate the “translation overhead” of a supervisor reprocessing every worker’s output.4 However, the supervisor provides stronger guarantees for reliability, error recovery, and auditability.

Think of it like...

A jazz ensemble improvising. No conductor tells each musician when to play. Instead, each musician listens to the others, knows when to take the lead, and hands the spotlight to the next player when the moment feels right. The music emerges from local awareness and mutual response, not from a central plan. The result can be brilliant, but it is harder to predict or debug than an orchestrated piece.

Strengths: No single point of failure, high scalability (no coordination bottleneck), faster for exploratory tasks where the optimal path is unknown.

Weaknesses: Hard to observe and debug (requires distributed tracing), no global arbiter to enforce sequencing, convergence can be unpredictable.5

Concept to explore

See swarm-architecture for handoff protocols, shared state design, and convergence strategies.


3. Hierarchical decomposition — agents as tools

The hierarchical pattern organises agents into a tree structure with multiple levels of delegation. A top-level manager agent delegates to mid-level supervisors, who delegate to leaf-level workers. Each level adds a layer of abstraction: the top level reasons about strategy, mid-levels reason about tactics, and workers execute specific actions.5

This is also expressed as the agents-as-tools pattern: higher-level agents treat lower-level agents as callable tools, invoking them for sub-problems just as they would invoke an API or database query. This creates layered control and modularity — a higher-level agent does not need to know how a sub-agent accomplishes its task, only that it returns a result in the expected format.3

The critical advantage is context window management. No single agent needs to hold the full context of the entire problem. The top-level agent holds the high-level objective and summary results. Mid-level agents hold their team’s context. Workers hold only their specific input. This allows hierarchical systems to tackle problems that would overflow any single agent’s context window.5

Strengths: Scales to large problems (20+ agents), manages context effectively, mirrors organisational structures.

Weaknesses: Latency accumulates at each level (2+ seconds per LLM call per level). Information loss at each summarisation step — a nuanced worker finding may be compressed to a single sentence by the supervisor.5


4. Sequential handoff — the relay race

A chain of specialised agents where each passes control to the next. Agent 1 completes its work and hands off to Agent 2, which completes its work and hands off to Agent 3. Each agent has full control during its turn and passes a structured output to its successor.7

This differs from a simple pipeline in a critical way: each stage is a full agent with autonomous reasoning and tool use, not just a prompted LLM call. The handoff includes context, intermediate results, and sometimes instructions for the next agent. LlamaIndex’s AgentWorkflow implements this pattern with explicit handoff directives that specify which agent receives control and what context it needs.7

For example, a content creation workflow might use:

AgentRoleHandoff output
Research agentGathers sources and extracts key factsStructured fact sheet
Drafting agentWrites the article from the fact sheetDraft document
Review agentChecks accuracy, style, completenessAnnotated draft with corrections
Publishing agentFormats and publishes the final versionPublished article

Think of it like...

A relay race. Each runner (agent) sprints their leg of the race and passes the baton (context and results) to the next. Each runner is a specialist in their segment. The handoff is the critical moment — a fumbled baton (lost context, poor formatting) costs more time than a slow leg.

Strengths: Predictable execution order, clear stage boundaries, easy to monitor and debug. Each agent can use a model and prompt optimised for its specific role.

Weaknesses: Cannot handle tasks where execution order depends on intermediate results. Total latency is the sum of all stages.5


Why multi-agent? The case for and against

Multi-agent systems solve specific problems that single agents cannot address efficiently:1

  • Division of labour. Different parts of a problem require different expertise, tools, and prompts. Specialised agents outperform generalist agents on focused tasks.
  • Context window limits. A single agent’s context window is finite. Multi-agent systems distribute information across separate context windows, enabling work on problems that exceed any single agent’s capacity.1
  • Parallelism. Independent subtasks run simultaneously, reducing total time from the sum of all tasks to the duration of the slowest one. See parallelisation for the underlying pattern.
  • Scalability. Adding a new capability means adding a new agent, not rewriting the entire system.

But multi-agent is not always the answer. Anthropic’s core principle is clear: start with the simplest solution that works, and add complexity only when necessary.3

When NOT to use multi-agent

  • The task fits within a single agent’s context window and capability set
  • Subtasks have heavy sequential dependencies (multi-agent adds coordination overhead without parallel benefit)
  • The cost of coordination exceeds the benefit of specialisation
  • A single agent with good tool access already solves the problem reliably

Google’s research demonstrates 39—70% performance degradation when teams add agents beyond what the task requires. Most production failures are coordination problems, not capability problems — adding more agents to a coordination problem makes it worse.4


Inter-agent communication patterns

How agents share information determines the system’s behaviour as much as the agents themselves.8

Shared state (blackboard pattern). All agents read from and write to a common state store. Simple to implement, but risks contention and requires careful concurrency management. Used in swarm architectures where agents discover and respond to each other’s work through the shared state.

Message passing. Agents send structured messages directly to each other or through a message broker. More complex than shared state, but enables loose coupling and asynchronous operation. Used in mesh and hierarchical architectures.

Handoff directives. An agent explicitly transfers control to another agent, passing along context and instructions. The simplest form of inter-agent communication. Used in sequential handoff and swarm patterns.

The shared vs isolated context trade-off

Shared context means all agents see the same information — good for coherence, bad for scalability (context window bloat and attention dilution). Isolated context means each agent has only what it needs — good for focus and scalability, bad for coherence (agents may produce contradictory outputs). Most production systems use a hybrid: agents have isolated working context but share results through a structured aggregation layer.1


Why do we use it?

Key reasons

1. Tasks exceed single-agent capacity. Some problems are too broad for one agent’s context window, too complex for one prompt, or too time-consuming for sequential processing. Multi-agent systems distribute the work across agents that each handle a manageable piece.1

2. Specialisation improves quality. A security-focused agent with a security-optimised prompt outperforms a generalist agent trying to check security, performance, and style in one pass. Each agent can use tailored prompts, tools, and even different models suited to its specific task.3

3. Parallel execution saves time. Anthropic’s multi-agent research system cut research time by up to 90% for complex queries by running subagents in parallel instead of sequentially.1

4. Modularity enables evolution. Adding a new capability means adding a new agent, not rewriting the system. Replacing one agent (upgrading its model, changing its tools) does not affect the others. This mirrors the microservices principle in software architecture.5


When do we use it?

  • When the task requires parallel exploration of multiple independent directions (research, due diligence, competitive analysis)
  • When the problem exceeds a single context window — too many documents, too much code, too many data sources for one agent to hold
  • When different parts of the task require different specialised tools or expertise that cannot be effectively combined in one prompt
  • When you need to scale a workflow by adding capacity (more agents) without redesigning the system
  • When latency matters and independent subtasks can run simultaneously

Rule of thumb

If a single agent with good tool access can solve the task reliably within its context window, use a single agent. Reach for multi-agent only when the task genuinely requires parallel work, exceeds context limits, or demands specialisation that one prompt cannot cover.3


How can I think about it?

The hospital emergency department

An emergency department is a multi-agent system where every professional has a specialised role.

  • The triage nurse = the routing agent. Assesses incoming patients and decides who handles each case. Does not treat patients — classifies and delegates.
  • The ER doctor = the supervisor agent. Holds the overall picture for a patient, makes high-level decisions, delegates specific tasks.
  • The radiologist = a specialist worker agent. Called when imaging is needed, operates independently with their own tools (scanners), returns structured results (scan reports) to the supervising doctor.
  • The lab technician = another specialist worker. Runs blood tests in parallel with the radiologist’s work. Neither waits for the other.
  • The patient handoff between shifts = sequential handoff. The outgoing doctor passes context (charts, status, plan) to the incoming doctor, who takes full control.
  • The hospital’s shared medical record = the blackboard (shared state). Every specialist reads and writes to the same record, maintaining coherence across the team.

No single doctor handles every aspect of care. The system’s quality comes from specialisation, parallel work, and coordination — the same properties that make multi-agent AI systems effective.

The film production crew

A film set is a multi-agent system with clear hierarchical delegation and parallel execution.

  • The director = the top-level supervisor. Holds the creative vision, decomposes each scene into shots, delegates execution.
  • The assistant directors = mid-level supervisors. Each manages a domain: one coordinates actors, another manages extras, a third handles logistics.
  • Camera operators, sound engineers, lighting technicians = specialist workers. Each has narrow expertise and specialised tools. They work in parallel during each take.
  • The script supervisor = the shared state manager. Tracks continuity across takes and scenes so that independently filmed shots will fit together coherently.
  • Post-production handoff = sequential handoff. Filmed footage passes from the set to editing, then to sound design, then to colour grading. Each stage is handled by a different specialist team.
  • The wrap meeting = the aggregation step. The director reviews all work, identifies gaps, and decides whether reshoots (retries) are needed.

The film’s quality comes from each specialist doing one thing well, the director coordinating the whole, and the production system ensuring nothing falls through the cracks.


Concepts to explore next

ConceptWhat it coversStatus
supervisor-workerTask decomposition, worker design, and the orchestrator-worker pattern in depthstub
swarm-architectureDecentralised coordination, handoff protocols, and convergence strategiesstub
agent-communicationMessage passing, shared state, handoff protocols, and inter-agent coordinationstub

Some cards don't exist yet

A broken link is a placeholder for future learning, not an error.


Check your understanding


Where this concept fits

Position in the knowledge graph

graph TD
    AIML[AI and Machine Learning] --> AS[Agentic Systems]
    AS --> LLM[LLM Pipelines]
    AS --> ORCH[Orchestration]
    AS --> MAS[Multi-Agent Systems]
    AS --> TU[Tool Use]
    MAS --> SW[Supervisor-Worker]
    MAS --> SWARM[Swarm Architecture]
    MAS --> AC[Agent Communication]
    style MAS fill:#4a9ede,color:#fff

Related concepts:

  • parallelisation — multi-agent systems use parallelisation internally, assigning independent sub-problems to specialised agents that work concurrently
  • tool-use — each agent in a multi-agent system uses tools to interact with the world; tool design is as critical as agent design
  • agent-memory — agents need memory to retain context across interactions; in multi-agent systems, shared vs isolated memory is a key design decision
  • autonomy-spectrum — multi-agent systems sit at the highest end of the spectrum; each agent’s autonomy level must be calibrated to its role
  • human-in-the-loop — multi-agent systems often include human checkpoints at critical decision points to maintain oversight and trust

Sources


Further reading

Resources

Footnotes

  1. Anthropic. (2025). How we built our multi-agent research system. Anthropic Engineering. 2 3 4 5 6 7 8

  2. AI Workflow Lab. (2026). How to Build Multi-Agent AI Systems in 2026. AI Workflow Lab.

  3. Schluntz, E. and Zhang, B. (2024). Building Effective Agents. Anthropic. 2 3 4 5 6 7

  4. Paperclipped. (2026). Multi-Agent Architecture Patterns Guide 2026. Paperclipped. 2 3

  5. GuruSup. (2026). Agent Orchestration Patterns: Swarm vs Mesh vs Hierarchical. GuruSup. 2 3 4 5 6 7 8 9 10

  6. The Thinking Company. (2026). Single-Agent vs Multi-Agent AI Systems. The Thinking Company.

  7. LlamaIndex. (2025). AgentWorkflow: Building Multi-Agent Systems. LlamaIndex Documentation. 2

  8. Mullapudi, M. (2026). Supervisor and Hierarchical Multi-Agent Patterns. TutorialQ.