Structured Data vs Prose

The fundamental trade-off in information systems: structured data is precise and machine-parseable but rigid; prose is rich and human-readable but ambiguous for machines --- and most real systems need both.

What is it?

Every piece of information you work with sits somewhere on a spectrum between two extremes. At one end is prose --- free-form natural language that humans write and read easily. A paragraph explaining a concept, an email describing a problem, a meeting transcript. At the other end is structured data --- information organised into predictable formats with explicit types, keys, and relationships. A database row, a JSON object, a spreadsheet column.¹

The tension between these two forms is one of the oldest problems in computing, and it has become even more important in the age of AI. Language models excel at processing prose --- they can read, summarise, translate, and generate natural language with remarkable fluency. But when a system needs to act on information (filter it, sort it, compute with it, pipe it between services), prose is treacherous. “The meeting is next Tuesday at 3pm” is perfectly clear to a human but requires parsing, inference, and disambiguation for a machine.²

The parent card machine-readable-formats covers the spectrum from unstructured text to fully constrained databases. This card zooms into the architectural decision at the heart of that spectrum: when should information be structured, when should it stay as prose, and how do you handle the cases where you need both?

This is not about any specific format --- json covers JSON syntax and usage. This card is about the tension itself and the patterns for managing it.

In plain terms

Imagine two ways to describe your home address. You could write a sentence: “I live in the yellow house on the corner of Rue de Bourg and Rue Saint-Martin, second floor, the one with the balcony facing south.” A human could find it. Or you could fill in a form: Street: Rue de Bourg 12, Floor: 2, City: Lausanne, Postal Code: 1003. A delivery system could find it. Prose gives context and nuance. Structure gives precision and reliability. Most systems need a bit of both.

At a glance

The structure spectrum (click to expand)
graph LR
    A[Free Prose] --> B[Semi-Structured]
    B --> C[Structured Data]
    C --> D[Typed Schema]
    style A fill:#94a3b8,color:#fff
    style B fill:#7c9abf,color:#fff
    style C fill:#4a9ede,color:#fff
    style D fill:#2563eb,color:#fff
Key: Information moves from left (maximum human readability, minimum machine parsability) to right (maximum machine parsability, minimum human nuance). Free prose is a blog post or email. Semi-structured is a markdown file with YAML frontmatter. Structured data is a JSON object or CSV row. Typed schema is a relational database with constraints. Most AI systems operate in the middle, combining prose with structured metadata.

How does it work?

1. What prose does well

Prose --- natural language in sentences and paragraphs --- is the native format of human thought and communication. It excels at things structured data cannot do:¹

Nuance and ambiguity: “The project is mostly on track, though the API integration is taking longer than expected and may slip by a week.” No structured status field captures this level of detail.
Explanation and reasoning: “We chose React over Vue because our team has more experience with it and the component ecosystem is richer for our use case.” This is a decision rationale --- it needs narrative, not a key-value pair.
Context and framing: Prose can establish why something matters, how it connects to other things, and what the reader should pay attention to. Structured data stores facts; prose tells the story around them.

The limitation of prose is precisely its strength: because it is flexible and expressive, a machine cannot reliably extract specific facts from it without natural language processing --- and even the best NLP models can get it wrong.²

Think of it like...

Prose is a conversation with a knowledgeable colleague. You get rich context, caveats, and connections --- but if you need to extract a specific number or date from a 30-minute conversation, you might misremember or miss it entirely. Structure is a form your colleague fills out --- you get exactly the fields you asked for, but none of the context.

2. What structured data does well

Structured data organises information into predictable patterns with explicit types, keys, and relationships. As the machine-readable-formats parent card describes, this means defined syntax, deterministic parsing, and explicit structure.³

Structured data excels at:

Machine processing: A program can filter, sort, aggregate, and transform structured data without ambiguity. Querying “all orders above 100 EUR placed in March” is trivial in a database; parsing the same information from a paragraph of prose requires AI.
Validation: You can enforce constraints --- this field must be a number, this date must be in the future, this status must be one of three allowed values. Prose cannot be validated this way.
Interoperability: Two systems that agree on a schema can exchange structured data seamlessly. Two systems trying to exchange prose must agree on how to interpret it --- a much harder problem.³
Computation: You can calculate averages, totals, percentages, and trends from structured data. You cannot calculate an average from a paragraph.

The limitation is rigidity. A structured format captures what its designer anticipated, not what the real world throws at it. An unexpected situation that does not fit the schema gets lost, truncated, or crammed into a “notes” field --- which is prose.

Think of it like...

Structured data is a tax form. Every box has a label, a type (number, date, checkbox), and rules (this must be positive, this cannot exceed that). The tax authority can process millions of forms automatically. But the form cannot capture “I was unemployed for three months, then freelanced, then got a job” --- for that, you need the attached letter of explanation.

3. The synchronisation problem

Most real systems contain both prose and structured data describing the same things. A product has a database record (name, price, SKU, stock count) and a marketing description (a paragraph of prose). A knowledge system has frontmatter (structured metadata) and a body (explanatory prose). The problem is keeping them consistent.⁴

Consider these scenarios:

A product’s price changes in the database, but the marketing page still says the old price
A concept card’s frontmatter lists three children, but the body mentions four
An API’s documentation describes five endpoints, but the actual API has six

This is the synchronisation problem: when the same information exists in both structured and unstructured forms, they drift apart over time unless actively maintained. The more copies, the more drift.⁴

Example: frontmatter and body drift (click to expand)
Consider a concept card in a knowledge system. The frontmatter says:
children:
  - "[[child-a]]"
  - "[[child-b]]"
But the body text says: “This concept has three sub-topics: child-a, child-b, and child-c.”

A human reading the body gets one picture. A script reading the frontmatter gets another. Neither is wrong on its own --- they are inconsistent with each other. The fix is either a validation script that checks frontmatter against body, or a generation step that produces one from the other.

4. Patterns for bridging the gap

Several practical patterns manage the tension between structure and prose:⁴⁵

Frontmatter + markdown: The pattern used in this knowledge system. YAML frontmatter provides machine-readable metadata (parent, children, tags, status). The markdown body provides human-readable explanation. Scripts can process the frontmatter; humans read the body. The synchronisation problem is managed by validation scripts.

JSON Schema + natural-language descriptions: An API schema defines the structure (field names, types, constraints), while description fields provide prose explanations of what each field means. The structure is for machines; the descriptions are for developers.

Structured LLM output: Modern language models can be instructed to output structured data (JSON, YAML, XML) rather than prose.⁵ This bridges the gap by using the model’s prose-understanding ability to produce structured output. For example, an LLM can read a paragraph about a person and output { "name": "...", "role": "...", "organisation": "..." }.

Embeddings as a bridge: embeddings convert prose into numerical vectors that machines can compare, cluster, and search. The prose retains its richness for human readers; the embedding provides a structured representation for machine operations like similarity search.⁵

Key distinction

The goal is almost never to eliminate one form in favour of the other. It is to use each form for what it does best and manage the boundary between them. Structure for machines. Prose for humans. Validation for consistency.

5. How AI changes the balance

The rise of large language models has shifted the trade-off in an important way.⁵ Before LLMs, unstructured prose was essentially opaque to machines --- extracting structured information from text required custom NLP pipelines that were brittle and expensive. Now, an LLM can:

Extract structured data from prose (read a contract, output key terms as JSON)
Generate prose from structured data (read a database record, write a product description)
Validate consistency between the two (compare frontmatter to body text and flag discrepancies)
Transform between formats (convert a meeting transcript into action items with owners and deadlines)

This does not eliminate the trade-off --- structured data is still faster, cheaper, and more reliable for machine processing than running every query through an LLM. But it means the boundary between “machine-readable” and “human-readable” is more fluid than ever before.

Yiuno example (click to expand)

This knowledge system is built on the frontmatter + markdown pattern. Each concept card has YAML frontmatter (structured: parent, children, tags, status, level) and a markdown body (prose: explanations, analogies, examples).

The generate-graph.py script reads only the frontmatter to build graph.json --- it ignores the prose entirely. A human reader reads primarily the prose body --- the frontmatter is metadata they can ignore.

When Claude Code creates a card, it uses both: it reads existing cards’ frontmatter to understand graph relationships (structured), reads their prose to understand framing and avoid contradictions (unstructured), and outputs a new card with both frontmatter (for the graph) and prose (for the reader).

Why do we use it?

Key reasons

1. Right tool for the job. Some information is inherently structured (a price, a date, a status). Some is inherently unstructured (an explanation, a rationale, a narrative). Forcing one into the other’s format always loses something.¹

2. Enabling both human and machine workflows. In any system where humans and machines collaborate, you need forms that each can process effectively. Prose for human comprehension. Structure for machine automation. Both for the hybrid workflows that define modern AI systems.³

3. Managing complexity. As systems grow, the volume of information exceeds what any single approach can handle. Structured data scales through databases and queries. Prose scales through search and summarisation. Understanding the trade-off lets you design systems that scale both ways.²

4. Avoiding false choices. The most common mistake is treating this as an either/or decision. Real systems need both. Understanding the trade-off helps you design for coexistence rather than picking a side and suffering the consequences.

When do we use it?

When designing a knowledge system that needs to serve both human readers and automated scripts
When building AI pipelines that must extract structured data from unstructured sources (documents, emails, transcripts)
When creating API documentation that must be both human-readable and machine-parseable
When deciding how to store metadata for a content system (frontmatter, database, or both)
When evaluating data formats for a new project and weighing readability against parsability

Rule of thumb

If the primary consumer is a human, default to prose with structured metadata on the side. If the primary consumer is a machine, default to structured data with prose descriptions where needed. If both humans and machines are primary consumers, use the frontmatter + body pattern.

How can I think about it?

The recipe book vs the ingredient database

A recipe book and a supermarket’s ingredient database contain overlapping information, but in fundamentally different forms.

The recipe book (prose) says: “Gently fold in 200g of dark chocolate, roughly chopped --- the bigger the chunks, the better the texture contrast when it melts.” A human reads this and knows exactly what to do, including the why behind the technique.

The ingredient database (structured) says: { "item": "dark chocolate", "quantity_g": 200, "preparation": "chopped" }. A shopping app reads this and adds it to your cart. A nutrition calculator computes the calories.

The recipe book cannot be queried (“show me all recipes using more than 100g of chocolate”). The database cannot teach you how to cook. A well-designed system has both: the database for machine operations, the prose for human learning --- and a process to keep them in sync.

The medical chart vs the doctor's notes

A patient’s medical record exists in two forms that serve different purposes.

The structured chart has fields: blood pressure (120/80), temperature (37.1C), diagnosis code (J06.9), medication (paracetamol 500mg), follow-up date (2026-04-12). Insurance systems, pharmacy systems, and scheduling systems all read these fields automatically.

The doctor’s notes (prose) say: “Patient presents with mild upper respiratory symptoms, likely viral. No red flags. Advised rest and fluids. Seemed anxious about missing work --- discussed sick leave options.” This captures nuance, clinical reasoning, and patient context that no structured field can hold.

Both are essential. The structured data runs the hospital’s operations. The prose ensures continuity of care between doctors. When they contradict each other (the chart says “resolved” but the notes say “ongoing concern”), patient safety is at risk --- this is the synchronisation problem in a high-stakes domain.

Concepts to explore next

Concept	What it covers	Status
json	The most common structured data format for web and AI systems	complete
knowledge-graphs	Representing relationships as structured, traversable data	stub
embeddings	Converting prose into numerical vectors for machine processing	stub
machine-readable-formats	The family of formats that sit on the structured end of the spectrum	complete

Some cards don't exist yet

A broken link is a placeholder for future learning, not an error.

Check your understanding

Test yourself (click to expand)

Explain why most real-world systems need both structured data and prose rather than choosing one exclusively.

Name three things prose does well that structured data cannot, and three things structured data does well that prose cannot.

Distinguish between the synchronisation problem and the format choice problem. How are they related but different?

Interpret this scenario: a team stores all project documentation as free-form wiki pages. They discover that tracking which features are complete, in progress, or blocked requires manually reading every page. What architectural change would help, and why?

Connect this concept to LLMs: how do language models change the historical trade-off between structured data and prose?

Where this concept fits

Position in the knowledge graph
graph TD
    KE[Knowledge Engineering] --> MRF[Machine-Readable Formats]
    MRF --> JSON[JSON]
    MRF --> SDvP[Structured Data vs Prose]
    SDvP -.->|related| JSON
    SDvP -.->|related| KG[Knowledge Graphs]
    SDvP -.->|related| EMB[Embeddings]
    style SDvP fill:#4a9ede,color:#fff
Related concepts:

json --- one of the primary structured data formats; this card explains when to use structure, json.md explains how to use JSON specifically

knowledge-graphs --- a way to represent relationships as structured, traversable data that can coexist with prose explanations

embeddings --- a bridge technology that converts prose into numerical vectors, enabling machine operations on unstructured text

machine-readable-formats --- the parent topic covering the full family of structured formats

Explorer

Structured Data vs Prose

Structured Data vs Prose

What is it?

At a glance

How does it work?

1. What prose does well

2. What structured data does well

3. The synchronisation problem

4. Patterns for bridging the gap

5. How AI changes the balance

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Graph View

Table of Contents

Backlinks

Explorer

Structured Data vs Prose

Structured Data vs Prose

What is it?

At a glance

How does it work?

1. What prose does well

2. What structured data does well

3. The synchronisation problem

4. Patterns for bridging the gap

5. How AI changes the balance

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Footnotes

Graph View

Table of Contents

Backlinks