Retrieval Engineering

Embeddings & cosine similarity

Lesson 1 of 5

What you'll learn

Treat embeddings as coordinates of meaning, not opaque blobs
Compute cosine similarity and rank documents by it
Understand why direction matters more than magnitude

An embedding maps text to a fixed-length vector. The model is trained so that text with similar meaning lands at nearby points in that space, and unrelated text lands far apart. Retrieval is then a geometry problem: embed the query, embed the corpus, and find the corpus vectors closest to the query.

"Closest" almost always means cosine similarity — the cosine of the angle between two vectors. It ranges from -1 (opposite) through 0 (orthogonal / unrelated) to 1 (identical direction).

// cosine(a, b) = dot(a, b) / (|a| * |b|)

Why direction, not length

Embedding magnitude tends to track incidental things — document length, token frequency, formatting — not meaning. Cosine throws magnitude away and compares only orientation, which is exactly the signal you want. A short answer and a long answer about the same topic should rank as similar, and cosine makes that happen.

The ranking loop in practice

At query time you compute one similarity per candidate, then sort descending and keep the top results. With millions of vectors you wouldn't scan them all (that's the next few lessons), but the scoring math underneath is always this same dot-product-over-norms.

Normalize once, score cheap

If you L2-normalize every vector at index time, each magnitude becomes 1 and cosine similarity collapses to a plain dot product. That removes two square roots and a division from the hot path of every query.

Rank documents by cosine

Run it. It scores three candidate document vectors against a query vector and prints them ranked from most to least similar.

Loading editor…

Knowledge check

Why does cosine similarity ignore vector magnitude when ranking embeddings?

Saved on this device. Sign in to sync your progress everywhere.

Next Chunking & indexing