Parallelisation

Running independent subtasks at the same time rather than one after another, then merging the results — trading compute cost for speed and robustness.


What is it?

prompt-chaining solves complex tasks by processing them in a strict sequence: step 1, then step 2, then step 3. That works when each step depends on the previous one. But many real tasks contain parts that are completely independent of each other — checking a document for factual accuracy has nothing to do with checking it for tone, and neither needs to wait for the other.

Parallelisation is the pipeline pattern that exploits this independence. Instead of processing subtasks one after another, you fan them out to run simultaneously and then gather the results into a single output.1 The total time drops from the sum of all subtasks to the duration of the slowest one.

Anthropic’s Building Effective Agents guide identifies two distinct sub-patterns within parallelisation:1

  • Sectioning — splitting a task into different independent subtasks that each handle a separate concern, running them in parallel, and merging the results.
  • Voting — running the same task multiple times (often with different prompts or model configurations) to get diverse outputs, then aggregating them for higher confidence.

Both sub-patterns share the same fan-out/gather structure, but they serve different purposes. Sectioning increases speed and specialisation. Voting increases reliability and confidence.

In plain terms

Imagine you need to clean your house before guests arrive. You could vacuum every room, then dust every room, then clean every bathroom — doing one task at a time across the whole house. Or you could ask three people to each take one room and do everything in that room simultaneously. The total work is the same, but the clock time drops dramatically because the rooms are independent.


At a glance


How does it work?

1. Sectioning — different jobs in parallel

Sectioning breaks a task into independent concerns, assigns each concern to a separate LLM call with a specialised prompt, and runs all calls concurrently.1 Each branch focuses on one aspect of the problem, which means its prompt can be optimised for that specific concern without competing objectives.

For example, when reviewing a piece of writing, you might run three parallel branches:

BranchFocusPrompt optimised for
Branch AFactual accuracyVerifying claims against source material
Branch BTone and styleChecking consistency with a style guide
Branch CStructure and completenessEnsuring all required sections are present

Each branch produces its own assessment. An aggregation step then merges the three assessments into a single review.2

Think of it like...

A panel of specialists at a medical consultation. The cardiologist examines the heart, the neurologist examines the nervous system, and the radiologist reads the scans — all at the same time. No specialist waits for another to finish. At the end, they convene to combine their findings into a single diagnosis.

Anthropic specifically highlights guardrail implementation as a strong use case for sectioning: one model instance processes the user query while another simultaneously screens it for inappropriate content. This tends to perform better than having a single LLM call handle both the guardrail check and the core response, because the concerns compete for attention in a single prompt.1


2. Voting — same job, multiple attempts

Voting runs the same task multiple times — often with different prompts, different temperature settings, or even different models — and aggregates the outputs to reach a more confident result.1 Where sectioning divides the work, voting multiplies it for robustness.

The aggregation method depends on the task:

Aggregation methodWhen to useExample
Majority voteBinary or categorical decisions3 out of 5 reviewers flag content as inappropriate
Threshold voteBalancing false positives and negativesRequire 4 out of 5 votes to block content
Best-of-N selectionGenerative tasks with quality variationGenerate 3 drafts, score each, keep the best
Weighted averageWhen some evaluators are more reliableWeight the specialist model’s vote higher than the generalist

Anthropic highlights two voting examples: reviewing code for vulnerabilities with several different prompts that each flag problems independently, and evaluating content appropriateness with multiple prompts that use different vote thresholds to balance false positives and negatives.1

Think of it like...

A panel of judges scoring a gymnastics routine. Each judge scores independently, without seeing the others’ scores. The final score is an aggregate — typically dropping the highest and lowest and averaging the rest. No single judge’s bias can dominate the outcome, and the aggregate is more reliable than any individual score.


3. The fan-out/gather mechanic

Both sectioning and voting follow the same structural pattern:3

  1. Fan-out: The input is distributed to multiple parallel branches. In sectioning, each branch gets a different instruction. In voting, each branch gets the same (or similar) instruction.
  2. Parallel execution: All branches run concurrently. No branch depends on any other branch’s output.
  3. Gather: An aggregation step collects all branch outputs and combines them into a single result.

The gather step can be implemented as deterministic code (concatenation, majority vote, de-duplication) or as another LLM call that synthesises the branch outputs into a coherent whole. The choice depends on how much judgement the merging requires.2

Key distinction

Sectioning fans out different tasks to different branches. Voting fans out the same task to multiple branches. Both use the same fan-out/gather structure, but the fan-out logic and the gather logic differ.


4. The latency vs cost trade-off

Parallelisation is not free. Running three branches in parallel means three simultaneous LLM calls — tripling the compute cost compared to a single call. The benefit is that wall-clock time drops to the duration of the slowest branch rather than the sum of all branches.1

This trade-off means parallelisation makes sense when:

  • Latency matters more than cost — the user is waiting for a response and you need to deliver faster
  • Quality matters more than cost — voting produces more reliable results than a single attempt, and the stakes justify the extra spend
  • Subtasks are genuinely independent — if branches depend on each other, you cannot run them in parallel (use prompt-chaining instead)

It does not make sense when:

  • Subtasks have sequential dependencies — step 2 needs step 1’s output
  • The task is simple enough for a single LLM call to handle reliably
  • Cost constraints are tight and the quality or speed gain does not justify multiplying the number of calls

Why do we use it?

Key reasons

1. Speed. When subtasks are independent, parallelisation reduces total processing time from the sum of all subtasks to the duration of the slowest one. For pipelines with multiple independent checks or analyses, this can cut latency dramatically.1

2. Specialisation. Each parallel branch can use a prompt tailored to its specific concern. A security review prompt does not need to share attention with a style review prompt. This focused attention improves the quality of each individual assessment.2

3. Robustness through diversity. Voting produces more reliable results than a single attempt by aggregating multiple independent assessments. A single LLM call might miss a vulnerability; three independent calls with different prompts are far less likely to all miss the same issue.1

4. Graceful degradation. If one parallel branch fails or times out, the other branches still produce their results. The system can deliver a partial result rather than failing entirely — the code review loses its style assessment but still reports security and performance findings.3


When do we use it?

  • When a task has multiple independent concerns that do not depend on each other (review dimensions, guard rails, evaluation criteria)
  • When latency is a constraint and you need results faster than sequential processing allows
  • When you need higher confidence in a decision and can afford to run the task multiple times (voting)
  • When different expertise is needed for different aspects of the same input (specialised prompts per branch)
  • When building guardrails that should run alongside the main task rather than before or after it

Rule of thumb

If you can describe the subtasks as “check A and check B and check C” where none depends on the others, parallelise. If it is “do A, then use A’s result to do B,” that is a chain, not a parallel task.1


How can I think about it?

The newspaper desk

A newspaper editor receives a breaking story and needs it ready for print fast.

  • Fact-checker verifies every claim against sources
  • Copy editor fixes grammar, spelling, and style
  • Photo editor selects and crops the accompanying images
  • Layout designer prepares the page template

All four work simultaneously on different aspects of the same story. None needs to wait for the others. When all four are done, their work is merged into the final page. The story reaches print in the time it takes the slowest editor, not the sum of all four.

This is sectioning: different specialists, same input, parallel execution, merged output.

The taste-test panel

A food company testing a new recipe does not rely on one taster’s opinion. They assemble a panel of 10 tasters who each evaluate the recipe independently.

  • Each taster scores the same dish on flavour, texture, and appearance
  • No taster sees the others’ scores until all have submitted
  • The final assessment is the aggregate of all scores
  • Outlier scores (one person hates cilantro) are diluted by the majority

This is voting: same task, multiple independent attempts, aggregated result. The panel’s collective judgement is more reliable than any single taster, and the process guards against individual bias.


Concepts to explore next

ConceptWhat it coversStatus
evaluator-optimiserUsing one LLM to generate and another to critique in an iterative refinement loopcomplete
orchestrationHow agents and pipeline stages are coordinated and managedstub
multi-agent-systemsArchitectures where multiple specialised agents collaborate on a taskstub

Some cards don't exist yet

A broken link is a placeholder for future learning, not an error.


Check your understanding


Where this concept fits

Position in the knowledge graph

graph TD
    AS[Agentic Systems] --> LP[LLM Pipelines]
    LP --> PC[Prompt Chaining]
    LP --> PR[Prompt Routing]
    LP --> PAR[Parallelisation]
    LP --> EO[Evaluator-Optimiser]
    LP --> CC[Context Cascading]
    LP --> RAG[RAG]
    style PAR fill:#4a9ede,color:#fff

Related concepts:

  • orchestration — orchestration manages which agents run and when; parallelisation is one execution strategy an orchestrator might use for independent subtasks
  • evaluator-optimiser — where parallelisation runs tasks simultaneously for speed or confidence, the evaluator-optimiser runs tasks iteratively for quality refinement
  • multi-agent-systems — multi-agent architectures often use parallelisation internally, assigning independent sub-problems to specialised agents that work concurrently

Sources


Further reading

Resources

Footnotes

  1. Schluntz, E. and Zhang, B. (2024). Building Effective Agents. Anthropic. 2 3 4 5 6 7 8 9 10

  2. Carpintero, D. (2025). Design Patterns for Building Agentic Workflows. Hugging Face. 2 3 4

  3. CallSphere. (2026). Parallel Fan-Out Fan-In Patterns: Processing Multiple Sub-Tasks Simultaneously. CallSphere. 2