LLM Pipelines

A multi-step workflow where a language model receives input, processes it through a sequence of stages, and produces output — turning a single interaction into a structured chain of transformations.

What is it?

When most people first use a large language model, they type a single prompt and get a single response. That works for simple tasks — summarising a paragraph, answering a factual question, drafting a short email. But as soon as the task involves multiple phases (research, then analysis, then writing), multiple concerns (accuracy, tone, format), or multiple tools (search, database, code execution), a single prompt-response exchange starts to break down.¹

An LLM pipeline is the alternative. Instead of asking the model to do everything in one shot, you decompose the task into discrete stages, where each stage has a focused job and passes its output to the next stage as input. The model runs multiple times — once per stage — with each run receiving only the context it needs for that particular step.²

This is not a new idea. Software engineering has used pipelines for decades (Unix pipes, ETL processes, CI/CD pipelines). The LLM version applies the same principle: break a complex process into small, composable, independently testable steps. What makes LLM pipelines distinctive is that the “processing” at each stage is done by a language model — reasoning, generating, evaluating, transforming — rather than by deterministic code alone.³

The parent concept, agentic-systems, frames LLM pipelines as one of two main building blocks (alongside orchestration). Where orchestration concerns how agents are coordinated and managed, pipelines concern the internal structure of how work flows through a system — the sequence of transformations that turns a request into a result.

In plain terms

A single prompt is like asking someone one question and expecting a perfect answer. A pipeline is like giving someone a checklist: first gather the facts, then analyse them, then write up the findings, then check for errors. Each step is simpler, and you can verify the result before moving on.

At a glance

From single prompt to multi-stage pipeline (click to expand)
graph LR
    subgraph Single Prompt
    A1[Input] --> B1[LLM] --> C1[Output]
    end
    subgraph Pipeline
    A2[Input] --> B2[Stage 1 - Extract] --> G[Gate]
    G -->|pass| D2[Stage 2 - Reason]
    D2 --> E2[Stage 3 - Generate]
    E2 --> F2[Output]
    G -->|fail| B2
    end
Key: The single-prompt approach asks the model to do everything at once. The pipeline breaks the task into stages with validation gates between them. Each stage has a narrow job, and the gate checks quality before passing output forward. If a gate fails, the stage can be retried or repaired without restarting the entire process.

How does it work?

LLM pipelines are built from a small set of patterns that can be combined. Anthropic’s research on building effective agents identifies these as the foundational workflow patterns that underpin most production systems.²

1. Prompt chaining — the sequential pattern

The simplest pipeline is a straight line: the output of one LLM call becomes the input of the next. Each step focuses on a single sub-task, and you can insert programmatic checks (gates) between steps to validate that the process stays on track.²

For example, a content creation pipeline might look like:

Step	Task	Output
1	Research and extract key facts from source material	Structured fact list
2	Generate an outline based on the facts	Document outline
3	Gate: check that the outline covers all required topics	Pass/fail decision
4	Write the full draft from the outline	Draft document
5	Review and edit for tone, accuracy, and style	Final document

Each step is easier for the model than doing everything at once. And critically, each intermediate output is an inspectable artifact — if step 4 produces a bad draft, you can examine the outline from step 2 to see whether the problem originated there.⁴

Think of it like...

An assembly line in a factory. Each station has one job (cut, weld, paint, inspect). No station tries to build the entire product. The product gains value at each station, and quality checks between stations catch defects early — before more work is wasted on a flawed component.

Concept to explore

See context-cascading for how context is layered from general to specific across pipeline stages, ensuring each stage receives the right information without overloading the model’s context window.

2. Routing — the branching pattern

Not every input should follow the same path. A routing step classifies the input and directs it to a specialised downstream handler. This allows different types of requests to receive different treatment without one handler trying to be good at everything.²

For example: a customer service pipeline might classify incoming messages as billing questions, technical issues, or general inquiries, then route each to a stage with specialised instructions and tools for that category.

Routing can be implemented through rule-based logic (keyword matching, regex), LLM-based classification (asking the model to categorise the input), or a hybrid of both.⁵

Think of it like...

A hospital triage desk. The triage nurse does not treat patients — they assess the situation and direct each patient to the right department (emergency, outpatient, specialist). The nurse’s job is fast classification; the department’s job is specialised treatment.

Concept to explore

See prompt-routing for a deeper dive into how routing decisions are made and the trade-offs between rule-based and LLM-based classification.

3. Parallelisation — the divide-and-conquer pattern

When a task has independent sub-tasks, you can run them simultaneously rather than sequentially. This comes in two forms: sectioning (splitting the work into different sub-tasks that run in parallel) and voting (running the same task multiple times with different approaches to get diverse outputs).²

For example: when reviewing a document, one parallel branch could check for factual accuracy while another checks for tone and style. Neither depends on the other, so they run concurrently, and their results are aggregated at the end.

Example: parallel code review (click to expand)

Consider a code review pipeline:

Branch Focus Output
Branch A Security vulnerabilities List of security findings
Branch B Performance issues List of performance findings
Branch C Code style and readability List of style findings

Aggregator: Merges all findings into a single review report, de-duplicates, and prioritises by severity.

Running these in parallel is faster than running them sequentially, and each branch can use a prompt optimised for its specific concern — a security-focused prompt does not need to worry about style.

Branch	Focus	Output
Branch A	Security vulnerabilities	List of security findings
Branch B	Performance issues	List of performance findings
Branch C	Code style and readability	List of style findings

A powerful pipeline pattern uses one LLM call to generate output and another to evaluate it, forming a loop that iterates until the output meets quality criteria.² This is sometimes called the evaluator-optimizer pattern.⁵

The evaluator provides specific, actionable feedback. The generator incorporates that feedback and produces a revised version. The cycle repeats until the evaluator passes the output or a maximum iteration count is reached.

This pattern is particularly valuable when quality criteria are clear and measurable — checking code against test cases, verifying translations against style guides, or validating factual claims against source material.

Key distinction

Gates are binary checkpoints between stages (pass/fail). Feedback loops are iterative — the evaluator provides detailed feedback that guides the next revision. Gates catch problems; feedback loops fix them.

5. Tool use — extending the pipeline beyond text

A pipeline stage is not limited to text-in, text-out. Stages can call external tools: APIs, databases, code interpreters, search engines, file systems. Tool use is what gives pipelines real-world impact — the ability to read, write, query, compute, and act, not just generate text.²

The pattern is straightforward: the model decides which tool to call, formats the input, receives the output, and incorporates it into its reasoning. This is the mechanism that connects LLM pipelines to the broader ecosystem of apis and services.

Concept to explore

See rag (Retrieval-Augmented Generation) for the most common tool-use pattern: retrieving relevant documents from a knowledge base before generating a response.

Why pipelines outperform monolithic prompts

Three structural reasons explain why splitting a task across multiple stages produces better results than a single large prompt:⁴

Reduced cognitive load per step. Each stage asks the model to do one thing well, rather than juggling multiple concerns simultaneously. Research on LLM context use shows that models lose track of information embedded in long prompts — a phenomenon called the “lost-in-the-middle” effect.⁶
Inspectable intermediate artifacts. If the final output is wrong, you can trace back through the stages to find where the error originated. In a single prompt, you get one output and no visibility into the reasoning path that produced it.⁴
Independent optimisation. Each stage can be tuned separately: different prompts, different models (a smaller model for classification, a larger one for generation), different temperature settings, different retry policies. You cannot do this with a monolithic prompt.²

Yiuno example: the concept card pipeline (click to expand)

Creating a concept card in this knowledge system follows a pipeline:

Stage What happens
1 - Context cascade Read CLAUDE.md, AGENTS.md, playbook, template (layered context loading)
2 - Research Search for quality explanations and authoritative sources
3 - Gate Verify that research found sufficient verified resources
4 - Write Generate the card following the template structure
5 - Validate Check quality against the playbook’s review checklist
6 - Connect Update the knowledge graph and link to parent/child cards

Each stage has a clear input, a clear output, and a quality check. The pipeline ensures consistent quality across all cards, regardless of which topic is being written.

Stage	What happens
1 - Context cascade	Read CLAUDE.md, AGENTS.md, playbook, template (layered context loading)
2 - Research	Search for quality explanations and authoritative sources
3 - Gate	Verify that research found sufficient verified resources
4 - Write	Generate the card following the template structure
5 - Validate	Check quality against the playbook’s review checklist
6 - Connect	Update the knowledge graph and link to parent/child cards

Why do we use it?

Key reasons

1. Reliability. A pipeline with validation gates catches errors at each stage, preventing mistakes from compounding. A single prompt has no checkpoints — if anything goes wrong, everything goes wrong.⁴

2. Debuggability. When the output is wrong, you can inspect each intermediate artifact to pinpoint where the problem originated. This turns “the AI gave a bad answer” into “stage 3 produced an incomplete analysis because the input from stage 2 was missing key data.”⁴

3. Cost and speed optimisation. Different stages can use different models. A lightweight model handles classification (fast, cheap); a powerful model handles generation (slower, more expensive). You pay for capability only where you need it.²

4. Composability. Pipeline stages are reusable building blocks. A “fact extraction” stage written for one pipeline can be reused in another. This is the same principle that makes Unix pipes powerful — small tools that do one thing well, composed into larger workflows.³

When do we use it?

When a task involves multiple distinct phases (research, analyse, generate, review)
When the task mixes different modes of work (extracting facts, making decisions, writing prose)
When correctness matters and you need validation checkpoints between steps
When you need traceability — the ability to audit how the output was produced
When the input is too large or complex for a single prompt to handle reliably
When you want to reuse stages across different workflows

Rule of thumb

If you can describe the task as a single, clear instruction (“summarise this paragraph”), a single prompt is fine. If you find yourself writing a prompt with multiple numbered steps, conditionals, or caveats, you are describing a pipeline — and you should build one.

How can I think about it?

The recipe analogy

A pipeline is like following a recipe with distinct preparation stages.

Mise en place (Stage 1 - Input preparation): Gather and prepare all ingredients before cooking. In a pipeline, this is data ingestion and context loading — assembling everything the model will need.

Prep work (Stage 2 - Extraction/transformation): Chop vegetables, marinate meat, measure spices. Each ingredient is prepared separately. In a pipeline, this is extracting facts, classifying input, or reformatting data.

Taste test (Gate): Check the seasoning before moving on. In a pipeline, this is a validation gate that verifies quality before the next stage.

Cooking (Stage 3 - Core generation): Combine prepared ingredients and apply heat. In a pipeline, this is the main generation step where the model produces the primary output.

Plating (Stage 4 - Post-processing): Arrange the dish for presentation. In a pipeline, this is formatting, polishing, and final quality checks.

A chef who tries to do all of this simultaneously — chopping while sauteing while plating — produces chaos. The stages exist because each requires different attention and tools, and the order matters.

The editorial desk analogy

A pipeline is like a newspaper’s editorial process.

Reporter (Stage 1): Gathers facts from sources and writes a raw draft. Focused on completeness, not polish.

Fact-checker (Stage 2): Verifies every claim against sources. Catches errors before they propagate. This is a validation gate.

Editor (Stage 3): Restructures, rewrites for clarity, enforces the publication’s style guide. Focused on quality, not gathering.

Copy editor (Stage 4): Catches grammar, spelling, formatting issues. Fine-grained polish.

Layout (Stage 5): Formats the final piece for publication. The template and output formatting stage.

No single person does all of these jobs simultaneously. Each role has specialised skills and a narrow focus. The newspaper’s quality comes from the pipeline, not from any individual genius — and if the fact-checker catches an error, it is fixed before the editor wastes time polishing a flawed article.

Concepts to explore next

Concept	What it covers	Status
prompt-chaining	The simplest pipeline pattern — a strict sequence of LLM calls with validation gates	complete
prompt-routing	How systems classify input and direct it to specialised handlers	stub
parallelisation	Running independent subtasks concurrently via sectioning or voting	complete
evaluator-optimiser	The generate-evaluate-refine loop for iterative quality improvement	complete
context-cascading	Layering context from general to specific across pipeline stages	complete
rag	Retrieving external knowledge to augment generation	stub
structured-output	Constraining LLM responses to specific formats for reliable pipeline handoffs	complete

Some cards don't exist yet

A broken link is a placeholder for future learning, not an error.

Check your understanding

Test yourself (click to expand)

Explain why a multi-stage pipeline produces more reliable results than a single monolithic prompt for complex tasks. What structural properties make the difference?

Name the five core pipeline patterns (sequential, routing, parallelisation, evaluation loops, tool use) and describe when each is most appropriate.

Distinguish between a gate and a feedback loop in a pipeline. When would you use each?

Interpret this scenario: a pipeline’s final output contains a factual error, but the intermediate “fact extraction” stage produced correct facts. Where in the pipeline did the error most likely originate, and how would you fix it?

Connect LLM pipelines to the concept of APIs. How do APIs enable pipeline stages to interact with external systems, and why is this important for moving beyond text-only processing?

Where this concept fits

Position in the knowledge graph
graph TD
    AS[Agentic Systems] --> LP[LLM Pipelines]
    AS --> ORCH[Orchestration]
    LP --> PCH[Prompt Chaining]
    LP --> PR[Prompt Routing]
    LP --> PAR[Parallelisation]
    LP --> EO[Evaluator-Optimiser]
    LP --> CC[Context Cascading]
    LP --> RAG[RAG]
    LP --> SO[Structured Output]
    style LP fill:#4a9ede,color:#fff
Related concepts:

orchestration — while pipelines define the flow of work through stages, orchestration manages which agents run, when, and how they coordinate

machine-readable-formats — pipeline stages often pass structured data (JSON, YAML) between them, making machine-readable formats essential for reliable handoffs

apis — tool-use stages in a pipeline call external services through APIs, connecting the pipeline to the wider software ecosystem

Explorer

LLM Pipelines

LLM Pipelines

What is it?

At a glance

How does it work?

1. Prompt chaining — the sequential pattern

2. Routing — the branching pattern

3. Parallelisation — the divide-and-conquer pattern

4. Evaluation and feedback loops — the refinement pattern

5. Tool use — extending the pipeline beyond text

Why pipelines outperform monolithic prompts

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Graph View

Table of Contents

Backlinks

Explorer

LLM Pipelines

LLM Pipelines

What is it?

At a glance

How does it work?

1. Prompt chaining — the sequential pattern

2. Routing — the branching pattern

3. Parallelisation — the divide-and-conquer pattern

4. Evaluation and feedback loops — the refinement pattern

5. Tool use — extending the pipeline beyond text

Why pipelines outperform monolithic prompts

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Footnotes

Graph View

Table of Contents

Backlinks