Embeddings

Numerical representations of text, images, or audio as lists of numbers (vectors) in high-dimensional space, where similar meanings are placed close together and different meanings are placed far apart.

What is it?

Computers do not understand words the way humans do. To a computer, the word “dog” is just a sequence of characters — d, o, g — with no inherent meaning. It has no idea that “dog” is closer in meaning to “puppy” than to “democracy.” Embeddings solve this problem by translating meaning into numbers.¹

An embedding is a vector — a list of numbers — that represents a piece of content (a word, a sentence, a paragraph, an image) in a way that captures its meaning. The key property is geometric: items with similar meanings end up as vectors that are close together in space, while items with different meanings end up far apart.² The word “king” and the word “queen” would have vectors that are near each other, because they share many semantic properties (royalty, authority, leadership). The word “bicycle” would be far away from both.

These vectors are produced by embedding models — neural networks trained on massive amounts of text (or images, or audio) to learn which concepts are related and how. The model does not follow hand-written rules about meaning; it learns patterns from data. After training, you pass any text into the model, and it returns a vector — typically a list of hundreds or thousands of numbers — that encodes the meaning of that text.³

The parent concept, machine-readable-formats, covers how data is structured for machines to process. Embeddings are a specific kind of machine-readable format: instead of encoding data as key-value pairs (JSON) or rows and columns (CSV), they encode meaning as coordinates in a mathematical space. This makes them uniquely powerful for tasks where you need to compare meanings rather than match exact strings.

In plain terms

Embeddings are like GPS coordinates for meaning. Just as GPS turns a physical location (“the Eiffel Tower”) into numbers (48.8584, 2.2945) that a computer can work with, an embedding turns a concept (“royal female leader”) into a list of numbers that captures what it means. Nearby coordinates mean nearby meanings.

At a glance

From words to vectors to similarity (click to expand)
graph LR
    A[Text Input] --> B[Embedding Model]
    B --> C[Vector]
    C --> D{Compare}
    E[Another Text] --> F[Embedding Model]
    F --> G[Vector]
    G --> D
    D --> H[Similarity Score]
Key: Text goes in, a vector (list of numbers) comes out. To check whether two texts mean similar things, you compare their vectors using a distance calculation. Close vectors mean similar meanings; distant vectors mean different meanings.

How does it work?

1. Vectors — lists of numbers that encode meaning

A vector is simply an ordered list of numbers. A two-dimensional vector might look like [0.2, 0.8]. A real embedding vector from a modern model has hundreds or thousands of dimensions — for example, OpenAI’s text-embedding-3-small produces vectors with 1,536 numbers.³

Each number in the vector represents some learned aspect of meaning. Unlike a JSON key where you know exactly what "population": 140000 means, individual embedding dimensions do not have human-readable labels. Dimension 47 might partially encode “formality,” dimension 312 might partially encode “scientific domain” — but these are patterns the model learned, not categories a human defined. The meaning emerges from all the numbers taken together.²

Think of it like...

A colour code. The hex colour #FF6B35 means nothing if you look at each character individually, but together they specify an exact shade of orange. Similarly, each number in an embedding means little on its own, but together they specify an exact shade of meaning.

2. The geometry of meaning — close means similar

The power of embeddings comes from a simple geometric principle: distance equals difference in meaning.¹

If you plot embedding vectors in space (imagining we could see hundreds of dimensions), you would find clusters. Words about cooking would cluster together. Words about finance would form their own cluster. Words about medicine would form another. And within each cluster, more closely related concepts would sit closer together — “sauteing” near “frying,” both near “cooking,” all far from “mortgage.”⁴

This clustering is not programmed by hand. It emerges from training. The embedding model reads billions of sentences and learns that “sauteing” and “frying” appear in similar contexts (near words like “pan,” “oil,” “heat”), so it places their vectors close together.

The "king - man + woman = queen" example (click to expand)

One of the most famous demonstrations of embedding geometry comes from early word embedding research (Word2Vec). Researchers found that vector arithmetic on word embeddings produced meaningful results:⁵

vector(“king”) - vector(“man”) + vector(“woman”) ≈ vector(“queen”)

What this means: if you take the vector for “king,” subtract the direction that encodes “male,” and add the direction that encodes “female,” you arrive near the vector for “queen.” The model learned that the relationship between king and queen mirrors the relationship between man and woman — without anyone telling it so.

This is not a parlour trick. It demonstrates that embeddings capture relationships between concepts, not just individual meanings. The “royalty” dimension, the “gender” dimension, and the “authority” dimension are all encoded in the geometry, and you can navigate between concepts by moving along these dimensions.⁵

3. How embedding models are created

An embedding model is a neural network trained on large amounts of text. The training process works roughly like this:³

The model reads billions of sentences from books, websites, and articles
It learns to predict which words appear near each other (context prediction)
Words that frequently appear in similar contexts get similar vectors
After training, the model can produce a vector for any new text it has never seen before

Modern embedding models (like those from OpenAI, Cohere, or open-source models on Hugging Face) go beyond individual words. They embed entire sentences or paragraphs, capturing the meaning of the full passage rather than just individual terms.³

Think of it like...

Learning a language by immersion. If you hear the word “gatto” every time someone points at a cat, pets a cat, or feeds a cat, you learn that “gatto” means cat — without anyone giving you a dictionary. Embedding models learn meaning the same way: by observing which words appear in which contexts, millions of times over.

4. Why embeddings power modern search

Traditional keyword search matches exact words. If you search for “how do I fix a broken window,” it looks for documents containing those exact terms. A document titled “Glass Repair Guide” might not match at all, because it does not contain the words “fix,” “broken,” or “window.”⁴

Embedding-based search (semantic search) works differently. It converts both the query and every document into vectors, then finds the documents whose vectors are closest to the query vector. “How do I fix a broken window” and “Glass Repair Guide” would have nearby vectors because they are about the same thing — even though they share no words in common.¹

This is the foundation of rag (Retrieval-Augmented Generation), the pattern where an LLM retrieves relevant documents from a knowledge base before generating a response. The retrieval step uses embeddings to find documents that are semantically relevant to the user’s question, not just keyword matches.⁴

Example: semantic search in action (click to expand)

Consider a knowledge base with these three documents:

Document Content
Doc A Steps for replacing a cracked pane in a wooden frame
Doc B How to configure window settings in your operating system
Doc C A history of stained glass in European cathedrals

Query: “how do I fix a broken window”

Keyword search might return Doc B (contains “window”) and miss Doc A entirely (no shared keywords).

Semantic search compares the meaning of the query to each document’s meaning:

Doc A: high similarity (both about repairing physical windows)

Doc B: low similarity (different meaning of “window”)

Doc C: moderate similarity (about glass, but not repair)

Embeddings resolve the ambiguity because they encode meaning, not just words. The vector for “fix a broken window” is close to the vector for “replacing a cracked pane” because the underlying meaning is the same.

Document	Content
Doc A	Steps for replacing a cracked pane in a wooden frame
Doc B	How to configure window settings in your operating system
Doc C	A history of stained glass in European cathedrals

5. Dimensions and distance measures

Two common ways to measure how close two vectors are:²

Cosine similarity measures the angle between two vectors. A score of 1.0 means identical direction (identical meaning), 0.0 means unrelated, and -1.0 means opposite. This is the most common measure for text embeddings because it ignores vector length and focuses purely on direction.
Euclidean distance measures the straight-line distance between two points. Smaller distance means more similar. This is more intuitive geometrically but can be affected by vector magnitude.

In practice, most embedding-based search systems use cosine similarity because it is robust and fast to compute.²

Key distinction

Embeddings represent meaning as geometry. Cosine similarity measures that geometry. Together, they let you answer the question “how similar are these two pieces of text?” with a number — no keyword matching required.

Why do we use it?

Key reasons

1. Semantic understanding. Embeddings let machines compare meanings, not just strings. “Automobile” and “car” are recognised as near-identical even though they share no characters. This is foundational for search, recommendation, and classification.¹

2. Language-agnostic matching. Multilingual embedding models place “dog,” “chien,” and “Hund” near each other in vector space. You can search in English and find relevant documents written in French or German.³

3. Efficiency at scale. Comparing two vectors is a simple mathematical operation that takes microseconds. This makes it possible to search millions of documents in real time — something that would be impossible if every comparison required an LLM call.²

4. Foundation for RAG. Retrieval-Augmented Generation — the dominant pattern for grounding LLM responses in real data — depends entirely on embeddings to find the right documents to feed to the model.⁴

When do we use it?

When building semantic search that finds results by meaning, not just keywords
When implementing RAG to ground LLM responses in relevant documents from a knowledge base
When you need to classify or cluster text (group similar support tickets, detect duplicate questions)
When building recommendation systems (find articles similar to ones a user liked)
When you need to compare text across languages without translation

Rule of thumb

If the task requires understanding what text means rather than what words it contains, embeddings are almost certainly part of the solution.

How can I think about it?

The library with invisible shelving

Imagine a library where books are not shelved alphabetically or by genre, but by meaning. Books about cooking sit next to books about nutrition, which sit next to books about food science, which sit next to books about chemistry. A book about Italian cooking would be on the same shelf as a book about making pasta from scratch, even though their titles share no words.

Each book’s position = its embedding vector (coordinates in the library)

Nearby books = semantically similar content

Finding a book = computing the vector for your query and walking to that spot in the library

The shelving system = the embedding model that decided where to place each book

No card catalogue needed = no keyword index, because proximity is the index

This library would be useless for humans (you cannot see 1,536 dimensions), but it is exactly how a computer navigates a knowledge base using embeddings.

The colour wheel of language

Think of the colour wheel. Red, orange, and yellow are neighbours — they blend smoothly into each other. Red and green are on opposite sides — maximally different. You do not need to describe a colour in words to know how similar it is to another colour; you just check their positions on the wheel.

Each colour = a word or sentence

Position on the wheel = its embedding vector

Nearby colours blend = similar meanings cluster together

The wheel has many dimensions = a real embedding space has hundreds of axes, not just hue and saturation, capturing nuances like formality, domain, sentiment, and topic simultaneously

Mixing colours = vector arithmetic (king - man + woman = queen is like mixing hues to get a new colour)

Embeddings extend the colour wheel idea to language: every piece of text gets a position, and you navigate meaning by moving through the space.

Concepts to explore next

Concept	What it covers	Status
rag	Retrieving relevant documents via embeddings before generating a response	stub
vector-databases	Specialised databases optimised for storing and querying embedding vectors at scale	stub
knowledge-graphs	Structured representations of relationships between concepts, complementary to embeddings	stub

Some cards don't exist yet

A broken link is a placeholder for future learning, not an error.

Check your understanding

Test yourself (click to expand)

Explain what an embedding is to someone who has never encountered the concept. Use the GPS coordinates analogy or another everyday comparison.

Name the three key properties of embeddings that make them useful (semantic similarity as distance, produced by trained models, high-dimensional vectors) and describe why each matters.

Distinguish between keyword search and semantic search. Give a concrete example where keyword search fails but semantic search succeeds.

Interpret the “king - man + woman = queen” example. What does it tell you about what embedding vectors actually encode?

Connect embeddings to the concept of RAG. Why are embeddings essential for retrieval-augmented generation, and what would break without them?

Where this concept fits

Position in the knowledge graph
graph TD
    KE[Knowledge Engineering] --> MRF[Machine-Readable Formats]
    MRF --> JSON[JSON]
    MRF --> EMB[Embeddings]
    MRF --> SDvP[Structured Data vs Prose]
    style EMB fill:#4a9ede,color:#fff
Related concepts:

rag — embeddings are the retrieval mechanism that powers RAG, finding semantically relevant documents for an LLM to use as context

vector-databases — specialised storage systems built to index, store, and query embedding vectors at scale

knowledge-graphs — represent relationships explicitly as nodes and edges, complementing the implicit relationship encoding of embeddings

json — while JSON encodes data as key-value pairs for deterministic parsing, embeddings encode meaning as vectors for similarity comparison

Explorer

Embeddings

Embeddings

What is it?

At a glance

How does it work?

1. Vectors — lists of numbers that encode meaning

2. The geometry of meaning — close means similar

3. How embedding models are created

4. Why embeddings power modern search

5. Dimensions and distance measures

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Graph View

Table of Contents

Backlinks

Explorer

Embeddings

Embeddings

What is it?

At a glance

How does it work?

1. Vectors — lists of numbers that encode meaning

2. The geometry of meaning — close means similar

3. How embedding models are created

4. Why embeddings power modern search

5. Dimensions and distance measures

Why do we use it?

When do we use it?

How can I think about it?

Concepts to explore next

Check your understanding

Where this concept fits

Sources

Further reading

Footnotes

Graph View

Table of Contents

Backlinks