LLM Pipelines
A multi-step workflow where a language model receives input, processes it through a sequence of stages, and produces output — turning a single interaction into a structured chain of transformations.
What is it?
When most people first use a large language model, they type a single prompt and get a single response. That works for simple tasks — summarising a paragraph, answering a factual question, drafting a short email. But as soon as the task involves multiple phases (research, then analysis, then writing), multiple concerns (accuracy, tone, format), or multiple tools (search, database, code execution), a single prompt-response exchange starts to break down.1
An LLM pipeline is the alternative. Instead of asking the model to do everything in one shot, you decompose the task into discrete stages, where each stage has a focused job and passes its output to the next stage as input. The model runs multiple times — once per stage — with each run receiving only the context it needs for that particular step.2
This is not a new idea. Software engineering has used pipelines for decades (Unix pipes, ETL processes, CI/CD pipelines). The LLM version applies the same principle: break a complex process into small, composable, independently testable steps. What makes LLM pipelines distinctive is that the “processing” at each stage is done by a language model — reasoning, generating, evaluating, transforming — rather than by deterministic code alone.3
The parent concept, agentic-systems, frames LLM pipelines as one of two main building blocks (alongside orchestration). Where orchestration concerns how agents are coordinated and managed, pipelines concern the internal structure of how work flows through a system — the sequence of transformations that turns a request into a result.
In plain terms
A single prompt is like asking someone one question and expecting a perfect answer. A pipeline is like giving someone a checklist: first gather the facts, then analyse them, then write up the findings, then check for errors. Each step is simpler, and you can verify the result before moving on.
At a glance
From single prompt to multi-stage pipeline (click to expand)
graph LR subgraph Single Prompt A1[Input] --> B1[LLM] --> C1[Output] end subgraph Pipeline A2[Input] --> B2[Stage 1 - Extract] --> G[Gate] G -->|pass| D2[Stage 2 - Reason] D2 --> E2[Stage 3 - Generate] E2 --> F2[Output] G -->|fail| B2 endKey: The single-prompt approach asks the model to do everything at once. The pipeline breaks the task into stages with validation gates between them. Each stage has a narrow job, and the gate checks quality before passing output forward. If a gate fails, the stage can be retried or repaired without restarting the entire process.
How does it work?
LLM pipelines are built from a small set of patterns that can be combined. Anthropic’s research on building effective agents identifies these as the foundational workflow patterns that underpin most production systems.2
1. Prompt chaining — the sequential pattern
The simplest pipeline is a straight line: the output of one LLM call becomes the input of the next. Each step focuses on a single sub-task, and you can insert programmatic checks (gates) between steps to validate that the process stays on track.2
For example, a content creation pipeline might look like:
| Step | Task | Output |
|---|---|---|
| 1 | Research and extract key facts from source material | Structured fact list |
| 2 | Generate an outline based on the facts | Document outline |
| 3 | Gate: check that the outline covers all required topics | Pass/fail decision |
| 4 | Write the full draft from the outline | Draft document |
| 5 | Review and edit for tone, accuracy, and style | Final document |
Each step is easier for the model than doing everything at once. And critically, each intermediate output is an inspectable artifact — if step 4 produces a bad draft, you can examine the outline from step 2 to see whether the problem originated there.4
Think of it like...
An assembly line in a factory. Each station has one job (cut, weld, paint, inspect). No station tries to build the entire product. The product gains value at each station, and quality checks between stations catch defects early — before more work is wasted on a flawed component.
Concept to explore
See context-cascading for how context is layered from general to specific across pipeline stages, ensuring each stage receives the right information without overloading the model’s context window.
2. Routing — the branching pattern
Not every input should follow the same path. A routing step classifies the input and directs it to a specialised downstream handler. This allows different types of requests to receive different treatment without one handler trying to be good at everything.2
For example: a customer service pipeline might classify incoming messages as billing questions, technical issues, or general inquiries, then route each to a stage with specialised instructions and tools for that category.
Routing can be implemented through rule-based logic (keyword matching, regex), LLM-based classification (asking the model to categorise the input), or a hybrid of both.5
Think of it like...
A hospital triage desk. The triage nurse does not treat patients — they assess the situation and direct each patient to the right department (emergency, outpatient, specialist). The nurse’s job is fast classification; the department’s job is specialised treatment.
Concept to explore
See prompt-routing for a deeper dive into how routing decisions are made and the trade-offs between rule-based and LLM-based classification.
3. Parallelisation — the divide-and-conquer pattern
When a task has independent sub-tasks, you can run them simultaneously rather than sequentially. This comes in two forms: sectioning (splitting the work into different sub-tasks that run in parallel) and voting (running the same task multiple times with different approaches to get diverse outputs).2
For example: when reviewing a document, one parallel branch could check for factual accuracy while another checks for tone and style. Neither depends on the other, so they run concurrently, and their results are aggregated at the end.
Example: parallel code review (click to expand)
Consider a code review pipeline:
Branch Focus Output Branch A Security vulnerabilities List of security findings Branch B Performance issues List of performance findings Branch C Code style and readability List of style findings Aggregator: Merges all findings into a single review report, de-duplicates, and prioritises by severity.
Running these in parallel is faster than running them sequentially, and each branch can use a prompt optimised for its specific concern — a security-focused prompt does not need to worry about style.
4. Evaluation and feedback loops — the refinement pattern
A powerful pipeline pattern uses one LLM call to generate output and another to evaluate it, forming a loop that iterates until the output meets quality criteria.2 This is sometimes called the evaluator-optimizer pattern.5
The evaluator provides specific, actionable feedback. The generator incorporates that feedback and produces a revised version. The cycle repeats until the evaluator passes the output or a maximum iteration count is reached.
This pattern is particularly valuable when quality criteria are clear and measurable — checking code against test cases, verifying translations against style guides, or validating factual claims against source material.
Key distinction
Gates are binary checkpoints between stages (pass/fail). Feedback loops are iterative — the evaluator provides detailed feedback that guides the next revision. Gates catch problems; feedback loops fix them.
5. Tool use — extending the pipeline beyond text
A pipeline stage is not limited to text-in, text-out. Stages can call external tools: APIs, databases, code interpreters, search engines, file systems. Tool use is what gives pipelines real-world impact — the ability to read, write, query, compute, and act, not just generate text.2
The pattern is straightforward: the model decides which tool to call, formats the input, receives the output, and incorporates it into its reasoning. This is the mechanism that connects LLM pipelines to the broader ecosystem of apis and services.
Concept to explore
See rag (Retrieval-Augmented Generation) for the most common tool-use pattern: retrieving relevant documents from a knowledge base before generating a response.
Why pipelines outperform monolithic prompts
Three structural reasons explain why splitting a task across multiple stages produces better results than a single large prompt:4
-
Reduced cognitive load per step. Each stage asks the model to do one thing well, rather than juggling multiple concerns simultaneously. Research on LLM context use shows that models lose track of information embedded in long prompts — a phenomenon called the “lost-in-the-middle” effect.6
-
Inspectable intermediate artifacts. If the final output is wrong, you can trace back through the stages to find where the error originated. In a single prompt, you get one output and no visibility into the reasoning path that produced it.4
-
Independent optimisation. Each stage can be tuned separately: different prompts, different models (a smaller model for classification, a larger one for generation), different temperature settings, different retry policies. You cannot do this with a monolithic prompt.2
Yiuno example: the concept card pipeline (click to expand)
Creating a concept card in this knowledge system follows a pipeline:
Stage What happens 1 - Context cascade Read CLAUDE.md, AGENTS.md, playbook, template (layered context loading) 2 - Research Search for quality explanations and authoritative sources 3 - Gate Verify that research found sufficient verified resources 4 - Write Generate the card following the template structure 5 - Validate Check quality against the playbook’s review checklist 6 - Connect Update the knowledge graph and link to parent/child cards Each stage has a clear input, a clear output, and a quality check. The pipeline ensures consistent quality across all cards, regardless of which topic is being written.
Why do we use it?
Key reasons
1. Reliability. A pipeline with validation gates catches errors at each stage, preventing mistakes from compounding. A single prompt has no checkpoints — if anything goes wrong, everything goes wrong.4
2. Debuggability. When the output is wrong, you can inspect each intermediate artifact to pinpoint where the problem originated. This turns “the AI gave a bad answer” into “stage 3 produced an incomplete analysis because the input from stage 2 was missing key data.”4
3. Cost and speed optimisation. Different stages can use different models. A lightweight model handles classification (fast, cheap); a powerful model handles generation (slower, more expensive). You pay for capability only where you need it.2
4. Composability. Pipeline stages are reusable building blocks. A “fact extraction” stage written for one pipeline can be reused in another. This is the same principle that makes Unix pipes powerful — small tools that do one thing well, composed into larger workflows.3
When do we use it?
- When a task involves multiple distinct phases (research, analyse, generate, review)
- When the task mixes different modes of work (extracting facts, making decisions, writing prose)
- When correctness matters and you need validation checkpoints between steps
- When you need traceability — the ability to audit how the output was produced
- When the input is too large or complex for a single prompt to handle reliably
- When you want to reuse stages across different workflows
Rule of thumb
If you can describe the task as a single, clear instruction (“summarise this paragraph”), a single prompt is fine. If you find yourself writing a prompt with multiple numbered steps, conditionals, or caveats, you are describing a pipeline — and you should build one.
How can I think about it?
The recipe analogy
A pipeline is like following a recipe with distinct preparation stages.
- Mise en place (Stage 1 - Input preparation): Gather and prepare all ingredients before cooking. In a pipeline, this is data ingestion and context loading — assembling everything the model will need.
- Prep work (Stage 2 - Extraction/transformation): Chop vegetables, marinate meat, measure spices. Each ingredient is prepared separately. In a pipeline, this is extracting facts, classifying input, or reformatting data.
- Taste test (Gate): Check the seasoning before moving on. In a pipeline, this is a validation gate that verifies quality before the next stage.
- Cooking (Stage 3 - Core generation): Combine prepared ingredients and apply heat. In a pipeline, this is the main generation step where the model produces the primary output.
- Plating (Stage 4 - Post-processing): Arrange the dish for presentation. In a pipeline, this is formatting, polishing, and final quality checks.
A chef who tries to do all of this simultaneously — chopping while sauteing while plating — produces chaos. The stages exist because each requires different attention and tools, and the order matters.
The editorial desk analogy
A pipeline is like a newspaper’s editorial process.
- Reporter (Stage 1): Gathers facts from sources and writes a raw draft. Focused on completeness, not polish.
- Fact-checker (Stage 2): Verifies every claim against sources. Catches errors before they propagate. This is a validation gate.
- Editor (Stage 3): Restructures, rewrites for clarity, enforces the publication’s style guide. Focused on quality, not gathering.
- Copy editor (Stage 4): Catches grammar, spelling, formatting issues. Fine-grained polish.
- Layout (Stage 5): Formats the final piece for publication. The template and output formatting stage.
No single person does all of these jobs simultaneously. Each role has specialised skills and a narrow focus. The newspaper’s quality comes from the pipeline, not from any individual genius — and if the fact-checker catches an error, it is fixed before the editor wastes time polishing a flawed article.
Concepts to explore next
| Concept | What it covers | Status |
|---|---|---|
| prompt-chaining | The simplest pipeline pattern — a strict sequence of LLM calls with validation gates | complete |
| prompt-routing | How systems classify input and direct it to specialised handlers | stub |
| parallelisation | Running independent subtasks concurrently via sectioning or voting | complete |
| evaluator-optimiser | The generate-evaluate-refine loop for iterative quality improvement | complete |
| context-cascading | Layering context from general to specific across pipeline stages | complete |
| rag | Retrieving external knowledge to augment generation | stub |
| structured-output | Constraining LLM responses to specific formats for reliable pipeline handoffs | complete |
Some cards don't exist yet
A broken link is a placeholder for future learning, not an error.
Check your understanding
Test yourself (click to expand)
- Explain why a multi-stage pipeline produces more reliable results than a single monolithic prompt for complex tasks. What structural properties make the difference?
- Name the five core pipeline patterns (sequential, routing, parallelisation, evaluation loops, tool use) and describe when each is most appropriate.
- Distinguish between a gate and a feedback loop in a pipeline. When would you use each?
- Interpret this scenario: a pipeline’s final output contains a factual error, but the intermediate “fact extraction” stage produced correct facts. Where in the pipeline did the error most likely originate, and how would you fix it?
- Connect LLM pipelines to the concept of APIs. How do APIs enable pipeline stages to interact with external systems, and why is this important for moving beyond text-only processing?
Where this concept fits
Position in the knowledge graph
graph TD AS[Agentic Systems] --> LP[LLM Pipelines] AS --> ORCH[Orchestration] LP --> PCH[Prompt Chaining] LP --> PR[Prompt Routing] LP --> PAR[Parallelisation] LP --> EO[Evaluator-Optimiser] LP --> CC[Context Cascading] LP --> RAG[RAG] LP --> SO[Structured Output] style LP fill:#4a9ede,color:#fffRelated concepts:
- orchestration — while pipelines define the flow of work through stages, orchestration manages which agents run, when, and how they coordinate
- machine-readable-formats — pipeline stages often pass structured data (JSON, YAML) between them, making machine-readable formats essential for reliable handoffs
- apis — tool-use stages in a pipeline call external services through APIs, connecting the pipeline to the wider software ecosystem
Sources
Further reading
Resources
- Building Effective Agents (Anthropic) — The foundational reference on workflow patterns from the team behind Claude, covering prompt chaining, routing, parallelisation, and orchestrator-workers
- Design Patterns for Building Agentic Workflows (Hugging Face) — Comprehensive catalogue of six design patterns with architecture diagrams and use cases
- Prompt Chaining: Building Reliable Multi-Step LLM Workflows (TheLinuxCode) — Practical, engineer-oriented guide to sequential, conditional, and looping chain patterns with runnable code
- How to Build an AI Agent Pipeline (The Thinking Company) — Enterprise-focused guide covering prerequisites, pipeline architecture, and production deployment considerations
- Building Smarter AI Systems: Multi-Stage LLM Pipelines Explained (PMDG) — Clear overview of the five-stage pipeline architecture with a focus on why single-stage systems fall short
Footnotes
-
PMDG Technologies. (2025). Building Smarter AI Systems: Multi-Stage LLM Pipelines Explained. PMDG Technologies. ↩
-
Schluntz, E. and Zhang, B. (2024). Building Effective Agents. Anthropic. ↩ ↩2 ↩3 ↩4 ↩5 ↩6 ↩7 ↩8 ↩9
-
The Thinking Company. (2026). How to Build an AI Agent Pipeline. The Thinking Company. ↩ ↩2
-
TheLinuxCode. (2026). Prompt Chaining: Building Reliable Multi-Step LLM Workflows. TheLinuxCode. ↩ ↩2 ↩3 ↩4 ↩5
-
Carpintero, D. (2025). Design Patterns for Building Agentic Workflows. Hugging Face. ↩ ↩2
-
Liu, N. F., et al. (2024). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics, 12. ↩
