Context Cascading

Layering context files from general to specific so an LLM builds up understanding progressively, instead of receiving everything at once.

What is it?

When you work with a large language model on a complex task, the model needs background knowledge, rules, and task-specific instructions to produce good results. The naive approach is to dump everything into a single prompt. Context cascading is the alternative: you organise context into distinct layers, ordered from broad to narrow, and load each layer in sequence so the model accumulates understanding progressively.¹

The pattern typically follows a hierarchy: global context (organisation-wide identity and rules) feeds into domain context (area-specific knowledge and conventions), which feeds into task context (the procedure for the specific job at hand), which feeds into output context (the template or format for the deliverable). Each layer narrows the scope and increases the specificity.

This matters because LLMs process all of their context as a single input. The order, structure, and relevance of that input directly affect output quality.² Context cascading is a design pattern that treats context as infrastructure rather than an afterthought — engineering what the model sees so that it can reason effectively.

In plain terms

Think of context cascading like dressing for the weather. You start with a base layer (your general knowledge), add an insulating mid-layer (domain-specific rules), and finish with a shell layer (task-specific instructions). Each layer serves a purpose, and the order matters — you would not put the rain jacket on before the base layer.

At a glance

How context cascading works (click to expand)
graph TD
    G[Global Context] -->|narrows to| D[Domain Context]
    D -->|narrows to| T[Task Context]
    T -->|narrows to| O[Output Template]

    G -.->|identity and rules| G1[Who am I and what are my constraints]
    D -.->|area knowledge| D1[What domain am I working in]
    T -.->|procedure| T1[What exactly am I doing right now]
    O -.->|format| O1[What shape should the result take]
Key: Solid arrows show the cascade direction — each layer inherits from the one above and adds specificity. Dashed arrows show what each layer contributes. The model reads top to bottom, building cumulative understanding.

How does it work?

Context cascading operates through four layers, each with a distinct role. The layers are loaded in order, and each one assumes the previous layers are already in place.

1. Global context — identity and guardrails

The broadest layer defines who the system is, what voice it uses, and what rules are non-negotiable. This layer rarely changes. It might include the organisation’s name, communication style, ethical boundaries, and universal formatting rules.

For example: a company might maintain a global configuration file that says “Always write in British English, never include pricing unless confirmed, and cite sources for all factual claims.”

Think of it like...

The constitution of an organisation. It applies to everything, changes slowly, and overrides lower-level decisions when there is a conflict.

Every session the model runs, this layer is present. It creates a stable foundation that downstream layers can rely on without re-stating basics.¹

2. Domain context — area-specific knowledge

The second layer narrows focus to a particular area of work. A marketing team, an engineering team, and a legal team within the same organisation would each have their own domain context — built on the same global layer but adding field-specific conventions, terminology, and processes.

For example: the engineering domain context might specify coding conventions, preferred frameworks, and testing requirements. The marketing domain context might specify brand voice, audience personas, and content approval workflows.

Think of it like...

A department handbook. The company constitution still applies, but this handbook adds the rules specific to your department. Someone from another department does not need to read it.

3. Task context — the procedure for this job

The third layer is specific to the exact task being performed. It contains step-by-step instructions, decision criteria, and references to resources needed for this particular action. Task context is the most frequently swapped layer — a new task means a new task context, while the global and domain layers stay the same.

For example: “When writing a blog post, follow this outline structure, use these SEO keywords, and include at least one external citation per section.”

Example: How layers combine for a task (click to expand)

Consider a content team at a technology company:

Layer Content Loaded when
Global British English, no jargon, cite sources Every session
Domain Marketing voice, audience is technical managers When working on marketing tasks
Task Write a product announcement, 800 words, include pricing table When this specific task begins
Output Blog post template with title, intro, body, CTA sections When generation starts

Each layer assumes the ones above it. The task context does not need to repeat “use British English” because the global layer already established that.

Layer	Content	Loaded when
Global	British English, no jargon, cite sources	Every session
Domain	Marketing voice, audience is technical managers	When working on marketing tasks
Task	Write a product announcement, 800 words, include pricing table	When this specific task begins
Output	Blog post template with title, intro, body, CTA sections	When generation starts

4. Output context — the template

The final layer defines the shape of the deliverable. This might be a document template, a code scaffold, a data schema, or a structured format. It gives the model a concrete target to fill in, rather than generating structure from scratch.

Templates enforce consistency across outputs. When every blog post follows the same template, the model does not have to invent a structure each time — it focuses on content quality instead.³

Think of it like...

A form to fill in. The previous layers told the model what to know and how to behave. The template tells it what to produce.

Why order matters

The sequence of layers is not arbitrary. LLMs give disproportionate weight to information at the beginning and end of their context window — a phenomenon known as the “lost-in-the-middle” effect.⁴ By placing stable, high-priority context (global rules) first and task-specific details last, context cascading exploits this attention pattern.

Additionally, each layer acts as a filter for the next. The global layer constrains what the domain layer can specify. The domain layer constrains what the task layer can ask for. This prevents contradictions: if a task instruction conflicts with a global rule, the global rule wins because it was established first and carries higher authority.²

Context versus instructions

A common mistake is treating all input to an LLM as “instructions.” Context and instructions serve different purposes:

	Context	Instructions
Purpose	Background the model needs to understand the situation	Specific actions the model should take
Changes	Slowly (global) to frequently (task)	Every task
Example	”We are a healthcare company regulated by HIPAA"	"Summarise this patient report in 3 bullet points”
Analogy	The briefing before a mission	The mission orders themselves

Context cascading layers both context and instructions, but keeps them distinct within each layer. The global layer is mostly context with some standing instructions. The task layer is mostly instructions with some task-specific context.⁵

Why not dump everything at once?

Three reasons progressive loading outperforms monolithic prompts:¹

Token efficiency. Loading only what is needed for the current task conserves the context window for actual reasoning. Teams that audit their context budget often discover they waste 40% or more on information irrelevant to the current step.⁵
Signal-to-noise ratio. When everything is loaded at once, critical instructions compete with background information for the model’s attention. Targeted context selection consistently outperforms exhaustive loading — one insurance company found that curated context reached over 95% accuracy while feeding the full document corpus performed far worse.⁵
Maintainability. When context lives in separate layers, you can update one layer without touching the others. A change to global style rules propagates to every task without editing individual task files.

Concept to explore

See prompt-routing for how systems decide which context layers to load for a given task.

Why do we use it?

Key reasons

1. Consistency across sessions. Because the global and domain layers persist, the model behaves consistently even when different people trigger different tasks. Everyone shares the same foundation.¹

2. Scalability. New tasks only require a new task-layer file and optionally a new template. The rest of the cascade stays unchanged. Teams can add capabilities without redesigning the system.

3. Reduced errors. Each layer constrains the next, preventing contradictions and drift. The model is less likely to hallucinate or ignore rules because the relevant constraints are always loaded and always in the right position.²

4. Efficient context use. By loading only the layers relevant to the current task, context cascading maximises the ratio of useful information to total tokens. This matters because context windows are finite and every irrelevant token degrades reasoning.⁵

When do we use it?

When an LLM-based system needs to handle multiple task types with shared rules and conventions
When multiple people or agents interact with the same system and consistency matters
When the context for a single task would exceed practical limits if loaded all at once
When you need to maintain and update system behaviour without rewriting everything
When building agentic workflows where sub-agents need clean, scoped context to avoid pollution from unrelated tasks⁵

Rule of thumb

If your LLM system has more than one type of task or more than one person using it, context cascading will improve consistency and reduce maintenance overhead.

How can I think about it?

The military briefing chain

Military operations use a strict briefing hierarchy that mirrors context cascading.

Strategic briefing (global context): The theatre commander sets the overall objective, rules of engagement, and constraints. This applies to every unit in the operation.

Operational briefing (domain context): The division commander translates the strategy into a plan for their area of responsibility. They add terrain knowledge, resource allocations, and coordination rules.

Tactical briefing (task context): The squad leader gives specific instructions for the next mission — route, timing, targets, and contingencies.

Mission card (output template): Each soldier carries a card with call signs, frequencies, and checkpoints — the structured format for reporting back.

No one dumps the entire theatre strategy on a squad leader. Each level filters and refines, passing down only what the next level needs plus what it inherited from above.

Russian nesting dolls

Context cascading works like a set of matryoshka dolls, where each doll fits inside a larger one.

The outermost doll (global context) is the biggest and most visible. It defines the overall identity — the style of painting, the colour scheme, the theme.

The middle dolls (domain and task context) each add finer detail — facial expressions, accessories, unique patterns — while staying consistent with the outer doll’s style.

The innermost doll (output template) is the most specific: the final, concrete shape that everything else was building toward.

Each layer is self-contained (you can examine any single doll on its own), but it only makes full sense within the set. And critically, the outer layers constrain the inner ones: you cannot fit a doll that is larger than its container.

Yiuno-specific example (click to expand)

The yiuno vault uses context cascading with four files:

Layer File What it provides
Global CLAUDE.md Entry point, key rules, separation of public and private
Domain AGENTS.md Vault architecture, routing table, protocol
Task _ai/playbooks/concept-card.md Step-by-step procedure for writing a concept card
Output _ai/templates/card-template.md The exact frontmatter and section structure

When creating a concept card, the agent reads these four files in order. By the time it reaches the template, it already knows the vault rules, the architecture, and the writing procedure. The template just shapes the final output.

Layer	File	What it provides
Global	`CLAUDE.md`	Entry point, key rules, separation of public and private
Domain	`AGENTS.md`	Vault architecture, routing table, protocol
Task	`_ai/playbooks/concept-card.md`	Step-by-step procedure for writing a concept card
Output	`_ai/templates/card-template.md`	The exact frontmatter and section structure

Concepts to explore next

Concept	What it covers	Status
prompt-routing	How systems decide which context to load for each task	stub
playbooks-as-programs	Encoding multi-step procedures as structured context	stub
knowledge-graphs	Organising concepts into connected, navigable structures	stub

Some cards don't exist yet

A broken link is a placeholder for future learning, not an error.

Check your understanding

Test yourself (click to expand)

Explain why loading context in layers is more effective than putting everything into a single prompt. What problem does cascading solve?

Name the four typical layers in a context cascade and describe what each one contributes.

Distinguish between context and instructions. Why does it matter to keep them separate within each layer?

Interpret this scenario: an LLM produces output that follows the correct template and task instructions but uses the wrong tone of voice. Which layer of the cascade is most likely misconfigured or missing?

Connect context cascading to prompt routing. How would a routing system use the cascade pattern to handle different types of incoming requests?

Where this concept fits

Position in the knowledge graph
graph TD
    LP[LLM Pipelines] --> CC[Context Cascading]
    LP --> PR[Prompt Routing]
    LP --> PP[Playbooks as Programs]
    CC --> KG[Knowledge Graphs]
    style CC fill:#4a9ede,color:#fff
Related concepts:

prompt-routing — decides which context layers to load based on the incoming task

playbooks-as-programs — the task-context layer often takes the form of a structured playbook

knowledge-graphs — provide the domain-context layer with structured, navigable knowledge

Explorer

Context Cascading

Context Cascading

What is it?

At a glance

How does it work?

1. Global context — identity and guardrails

2. Domain context — area-specific knowledge

3. Task context — the procedure for this job

4. Output context — the template

Why order matters

Context versus instructions

Why not dump everything at once?

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Graph View

Table of Contents

Backlinks

Explorer

Context Cascading

Context Cascading

What is it?

At a glance

How does it work?

1. Global context — identity and guardrails

2. Domain context — area-specific knowledge

3. Task context — the procedure for this job

4. Output context — the template

Why order matters

Context versus instructions

Why not dump everything at once?

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Footnotes

Graph View

Table of Contents

Backlinks