Retrieval Engineering
Hybrid search & reranking
Lesson 4 of 5
What you'll learn
- Understand why dense and sparse retrieval are complementary
- Fuse keyword and vector scores into one hybrid ranking
- Apply a reranking stage over the top candidates
Vector search captures meaning but can fumble exact tokens — error codes, SKUs, function names, rare proper nouns. Keyword search (BM25, a sparse term-frequency method) nails those exact matches but is blind to paraphrase. Hybrid search runs both and fuses their scores, so "how do I get my money back" can still surface a chunk that literally says "refund."
Fusing two score scales
Dense and sparse scores live on different scales, so you can't just add them. Two common fixes: normalize each score to [0, 1] and take a weighted sum, or use Reciprocal Rank Fusion (RRF), which combines ranks instead of raw scores and is pleasantly scale-free.
// weighted fusion: final = alpha * denseNorm + (1 - alpha) * sparseNorm
// RRF: sum over retrievers of 1 / (k + rank)
Then rerank the shortlist
Hybrid fusion is cheap and runs over the whole index, but it's coarse. So you over-fetch — pull the top ~50 — and hand them to a reranker: a cross-encoder that reads the query and each candidate together and scores true relevance. It's far more accurate than embedding similarity but too expensive to run over millions of docs, which is why it only sees the shortlist. Retrieve wide and cheap, rerank narrow and precise.
Recall first, precision second
The first stage's only job is recall: get every relevant chunk into the candidate pool. The reranker's job is precision: order that pool correctly. If a document never makes the shortlist, no reranker can save it — so tune the first stage to over-fetch.
Run it. It normalizes a keyword score and a vector score into a hybrid ranking, then reranks the top-k with a combined relevance signal.
Why is a cross-encoder reranker run only over a small shortlist rather than the whole index?
Saved on this device. Sign in to sync your progress everywhere.