Constraining a language model to produce responses in a specific, machine-readable format — such as JSON with defined fields and types — rather than free-form text.
What is it?
When you ask a language model a question, it normally replies in natural language — sentences and paragraphs, like a person writing an email. That works well for conversation, but it creates a serious problem for automation. If the next step in your pipeline is a piece of software that needs to read the model’s answer, free-form text is unreliable. A program cannot easily extract the “price” from a sentence that says “The price is around $42, give or take” — it needs {"price": 42}.
Structured output is the practice of constraining an LLM so that its response conforms to a predefined schema — a blueprint that specifies exactly which fields must be present, what types they must be, and what values are allowed.1 Instead of asking the model to “describe the event”, you tell it to return a JSON object with title (string), date (string), location (string), and attendee_count (integer). The model’s response is then guaranteed (or at least strongly encouraged) to match that shape.
This matters because LLM pipelines — the parent concept llm-pipelines — pass data between stages. If stage 1 produces free-form text where stage 2 expects structured data, the pipeline breaks. Structured output is the mechanism that makes inter-stage handoffs reliable.2 It is also what enables tool-use: when a model decides to call a function, it must produce a structured object (the function name and arguments) that a program can execute — not a prose description of what it wants to do.
The prerequisite concept json explains the data format most commonly used for structured output. Understanding key-value pairs, objects, arrays, and nesting is essential before working with structured output, because JSON Schema — the validation language used to define the expected shape — builds directly on those primitives.
In plain terms
Structured output is like giving someone a form to fill in instead of asking them to write a letter. The form has labelled boxes (fields) with specific formats (date, number, yes/no). You get back exactly the information you need, in exactly the shape you need it — no extra prose, no missing fields, no surprises.
At a glance
Free text vs structured output (click to expand)
graph LR
subgraph Free Text
Q1[Query] --> LLM1[LLM] --> PROSE[Prose Response]
PROSE -->|parse?| APP1[Application]
APP1 -->|brittle| ERR[Errors]
end
subgraph Structured Output
Q2[Query] --> SCHEMA[Schema + LLM] --> JSON_OUT[JSON Response]
JSON_OUT -->|validate| APP2[Application]
APP2 -->|reliable| OK[Success]
end
Key: Without a schema, the application must guess how to extract data from prose — a brittle process that frequently breaks. With a schema, the model’s output is constrained to a predictable shape that the application can validate and consume directly.
How does it work?
Structured output is achieved through a combination of techniques, ranging from simple prompt instructions to deep integration with the model’s generation process. Each approach offers a different trade-off between simplicity and reliability.
1. Prompt-based enforcement
The simplest approach is to instruct the model in the prompt: “Return your answer as a JSON object with these fields.” This works surprisingly often, but it offers no guarantee. The model might add explanatory text before the JSON, use slightly different field names, or produce invalid syntax.3
For example:
Prompt: "Extract the event details. Return ONLY valid JSON with
these fields: title (string), date (string), location (string),
attendee_count (integer)."
Model response (hoping for the best):
{
"title": "PyCon 2026",
"date": "May 14-22",
"location": "Pittsburgh",
"attendee_count": 3500
}
Prompt-based enforcement is useful for prototyping and simple tasks, but it is insufficient for production systems where downstream code depends on the output shape being exact every time.
Think of it like...
Asking someone “please write your answer on the form” without actually giving them the form. Polite people will try, but you have no guarantee they will use the right format or include all the fields.
2. JSON Mode and response format constraints
Major model providers now offer a dedicated JSON mode that constrains the model’s output to valid JSON at the token-generation level. OpenAI’s response_format: { type: "json_object" } and Anthropic’s tool-use-based structured output both use this approach.45
JSON mode guarantees syntactically valid JSON, but it does not guarantee the JSON matches your specific schema. The model might return {"event_name": "PyCon"} when you expected {"title": "PyCon"}. This is where JSON Schema enforcement goes further.
3. JSON Schema enforcement
The most reliable approach combines JSON mode with a specific JSON Schema that defines the exact fields, types, and constraints. OpenAI’s Structured Outputs feature (introduced in 2024) accepts a JSON Schema or Pydantic model and uses constrained decoding to guarantee the output matches the schema exactly — not just valid JSON, but valid according to your schema.4
For example, using Pydantic with OpenAI:
from pydantic import BaseModelclass Event(BaseModel): title: str date: str location: str attendee_count: int is_virtual: bool# The model MUST return an object matching this schemaresponse = client.chat.completions.parse( model="gpt-4o", response_format=Event, messages=[...])
This is constrained decoding: at each token-generation step, the model is only allowed to produce tokens that keep the output conforming to the schema. It cannot deviate, add extra fields, or use wrong types.1
Think of it like...
A web form with input validation. The date field only accepts dates. The number field only accepts numbers. The user cannot submit the form until all required fields are filled correctly. The validation is enforced by the form itself, not by hoping the user follows instructions.
Before and after: the same query with and without structured output (click to expand)
Query: “Extract details from: Join us for PyCon 2026 in Pittsburgh on May 14-22. Expected 3500 attendees. In-person only.”
Without structured output (free text):
The event is PyCon 2026, taking place in Pittsburgh
from May 14 to 22. They expect about 3,500 attendees
and it will be held in person this year.
A program trying to extract attendee_count from this text would need to parse “about 3,500” — handling the comma, the word “about”, and the variable phrasing.
Every field is present, correctly typed, and directly usable by downstream code. No parsing, no guessing, no ambiguity.
4. Validation and retry loops
Even with schema enforcement, production systems add a validation layer after generation. This catches edge cases: a field that is syntactically valid but semantically wrong (e.g., attendee_count: -5), or a model that produces a refusal instead of data.3
The pattern is straightforward:
Generate the response with schema constraints
Validate the output against the schema (and business rules)
If validation fails, retry with error feedback appended to the prompt
After N retries, return a structured error or escalate
This validate-retry loop is a specific instance of the evaluator-optimizer pattern described in llm-pipelines.
Concept to explore
See guardrails for the broader framework of constraints that keep LLM systems safe and reliable — structured output is one guardrail among many.
5. The flexibility-reliability trade-off
Structured output introduces a fundamental tension: the more tightly you constrain the model’s output, the more reliable it becomes for automation — but the less room it has for nuance, explanation, or unexpected but useful information.2
Approach
Reliability
Flexibility
Best for
Free text
Low
High
Conversation, creative tasks
Prompt-based JSON
Medium
Medium
Prototyping, simple extraction
JSON Mode
High
Medium
Guaranteed valid JSON, flexible schema
Schema enforcement
Very high
Low
Production pipelines, tool calls
The right choice depends on what consumes the output. If a human reads it, flexibility matters. If a program reads it, reliability wins.
Why do we use it?
Key reasons
1. Automation reliability. Downstream systems — APIs, databases, other pipeline stages — need predictable data shapes. Structured output eliminates the fragile parsing layer between the LLM and the rest of the system.1
2. Reduced hallucination in tool-call pipelines. When a model must produce a specific function name and typed arguments, it is less likely to fabricate information than when writing free prose. The schema acts as a constraint that narrows the space of possible outputs.2
3. Validation becomes possible. You cannot meaningfully validate free text against a specification. With structured output, you can check every field for type, range, format, and presence — catching errors before they propagate downstream.3
4. Interoperability. Structured output in standard formats (JSON, XML) can be consumed by any programming language or system. This makes LLMs composable with existing software infrastructure, not siloed text generators.1
When do we use it?
When the LLM’s output will be consumed by code rather than read by a human
When building multi-stage pipelines where one stage’s output is the next stage’s input
When the model needs to call tools or functions (tool use requires structured arguments)
When extracting specific data points from unstructured text (entity extraction, classification)
When multiple models or systems need to exchange information in a common format
When you need to validate, test, or audit the model’s outputs programmatically
Rule of thumb
If the next consumer of the model’s output is a program (not a person), use structured output. If it is a person, free text is usually better.
How can I think about it?
The order form analogy
Structured output is like ordering from a restaurant using an order form instead of telling the waiter what you want in a conversation.
The order form is the schema — it defines what information is needed (dish, quantity, special requests, table number)
Each field has a type: dish is selected from a menu (enum), quantity is a number, special requests is free text
The kitchen (downstream system) can process the form directly — no waiter needs to interpret your casual conversation and translate it into kitchen instructions
If a required field is blank, the form is rejected before it reaches the kitchen (validation)
The form constrains your order: you cannot order a dish that is not on the menu, and you cannot write a poem in the quantity field
The trade-off: you lose the ability to say “something like yesterday’s special, but spicier” — the form does not have a field for that. Structured output sacrifices conversational flexibility for processing reliability.
The airport customs declaration analogy
Structured output is like a customs declaration form at an airport.
Every traveller (query) gets the same form (schema) with the same fields
The form specifies exact formats: passport number (alphanumeric, fixed length), date (DD/MM/YYYY), value of goods (number in local currency)
Customs officers (downstream systems) can process thousands of forms efficiently because every one has the same shape
A form with a missing passport number is rejected at the counter (validation), not discovered later when it causes a problem in the database
Without the form, each traveller would write a letter describing their trip — some would include the needed information, some would not, and processing would be slow and error-prone
The customs form exists because the system handling the data needs predictability at scale — exactly the same reason LLM pipelines use structured output.
The family of formats (JSON, XML, YAML) that machines can parse
stub
Some cards don't exist yet
A broken link is a placeholder for future learning, not an error.
Check your understanding
Test yourself (click to expand)
Explain why free-form text output from an LLM is problematic for automation. What specific failures can occur when a program tries to consume unstructured text?
Name the four main approaches to structured output (prompt-based, JSON mode, schema enforcement, validation loops) and describe the reliability-flexibility trade-off of each.
Distinguish between JSON mode and JSON Schema enforcement. Why is syntactically valid JSON not sufficient for most production use cases?
Interpret this scenario: a pipeline stage produces {"attendee_count": "three thousand five hundred"} when the schema specifies attendee_count as an integer. What went wrong, and which enforcement approach would have prevented it?
Connect structured output to the concept of tool use. Why must a model produce structured output (not prose) when calling a function?
Where this concept fits
Position in the knowledge graph
graph TD
LP[LLM Pipelines] --> PC[Prompt Chaining]
LP --> PR[Prompt Routing]
LP --> CC[Context Cascading]
LP --> RAG[RAG]
LP --> SO[Structured Output]
JSON[JSON] -.->|prerequisite| SO
SO -.->|related| TU[Tool Use]
SO -.->|related| GR[Guardrails]
style SO fill:#4a9ede,color:#fff
Related concepts:
machine-readable-formats — structured output produces data in machine-readable formats; JSON is the most common choice
tool-use — tool calling depends on structured output to format function names and arguments as parseable objects
structured-data-vs-prose — structured output is one answer to the broader question of when data should be structured rather than free-form
guardrails — output schema enforcement is one type of guardrail that constrains LLM behaviour for reliability
Building Effective Agents (Anthropic) — Foundational reference on agent workflow patterns, including how structured output enables tool use and pipeline handoffs