Hybrid search with Qdrant: what nobody tells you about BM25 + dense + image
What you actually wire up when you blend keyword, dense vector and image embeddings into one ranker — named vectors, fusion, drift, and the day your Turkish search broke on apostrophes.
We run a search engine over a multilingual product catalogue — Turkish, English, Macedonian — with about 7.8 million live SKUs and roughly 90k queries an hour at peak. The frontend has one search box. The backend runs three retrieval signals in parallel and fuses them. None of the three is good enough alone. Together they're decent.
This post is the field notes I wish someone had written for me eighteen months ago.
Why one signal isn't enough
The clean story goes: dense vectors beat keyword search because they understand semantics. The clean story is mostly wrong in production.
- BM25 wins on brand names, model numbers, SKU codes, anything where the user typed an exact string they remember.
- Dense vectors win on intent ("warm jacket for a kid" → fleece coats, sized 6–12y, in the children's category) but lose on apostrophes, ligatures, and Turkish suffixes.
- Image embeddings win on the queries you didn't think were queries — a photo, a screenshot, a vague visual idea ("something like that bag").
Each signal recovers what the other two drop. The art is in the fusion.
The Qdrant shape
We store all three vectors on the same point, using Qdrant's named vector feature. One collection, one upsert path, three retrieval modes:
from qdrant_client import QdrantClient, models
client = QdrantClient(url=QDRANT_URL, api_key=QDRANT_API_KEY)
client.create_collection(
collection_name="products",
vectors_config={
"text": models.VectorParams(size=512, distance=models.Distance.COSINE),
"image": models.VectorParams(size=512, distance=models.Distance.COSINE),
},
sparse_vectors_config={
"bm25": models.SparseVectorParams(
modifier=models.Modifier.IDF,
),
},
)The dense vectors are 512-dim from a multilingual sentence-transformer fine-tuned on our category tree. The image vector comes from the same model used in image-only mode (so cross-modal retrieval works — you can text-search a photo). BM25 is Qdrant's native sparse index with IDF re-weighting.
Three vectors, one point, no JOIN. This is the part that surprised me — until I tried to maintain three separate collections and the drift between them ate two weeks of my life.
Retrieving
A query goes out as three parallel searches, then we fuse:
async def hybrid_search(query: str, limit: int = 50) -> list[Hit]:
text_vec, sparse, image_vec = await asyncio.gather(
embed_text(query),
sparse_bm25(query),
embed_image_from_text(query), # text → image embedding via CLIP-style head
)
results = await client.query_points(
collection_name="products",
prefetch=[
models.Prefetch(query=text_vec, using="text", limit=200),
models.Prefetch(query=sparse, using="bm25", limit=200),
models.Prefetch(query=image_vec, using="image", limit=200),
],
query=models.FusionQuery(fusion=models.Fusion.RRF),
limit=limit,
)
return results.pointsReciprocal Rank Fusion (RRF) is shockingly hard to beat. We tried weighted score fusion, learned rerankers, even a small LLM rerank pass — RRF gave us 90% of the win for 5% of the complexity. We do run a Gemini rerank on the top 50 for premium queries, but for the long tail RRF is what ships.
What actually breaks
Now the part the tutorials skip.
Apostrophes. Turkish users type Adidas'in (Adidas's). Your sparse tokenizer breaks that into adidas + in. Your dense model is fine. Your sparse score drops 60%. Solution: a tokenizer pre-pass that normalises possessive suffixes per language.
Synonyms with no overlap. "Sneaker" in English maps semantically to "spor ayakkabı" in Turkish — but BM25 sees them as different tokens. Dense saves you here, but only if your model was trained multilingually. We use a multilingual encoder and accept that for Turkish brand names BM25 carries the load.
Image queries that aren't images. When a user types "red dress for a wedding" we run that text through a text→image projection head to get an image-space query vector. This is what makes the image index useful for text queries. Without it, the image index is dead weight when the user isn't holding a photo.
Drift. The single sharpest knife. The encoder model that vectorised the catalogue and the encoder model that vectorises the query must be the same version. The moment they diverge, recall collapses silently. We pin model versions in two places — the batch vectorizer service and the search service — and we re-vectorize the whole catalogue when we bump.
// Inside the search service — refuse to start if the embedding version
// doesn't match what the catalogue was indexed with.
func mustEmbedderVersion(t *testing.T) {
expected, err := qdrantPayload.Get("embedder_version").(string)
if err != nil || expected != embedder.Version {
t.Fatalf("embedder drift: catalogue=%s service=%s",
expected, embedder.Version)
}
}You can either run that check at startup (we do) or accept that you'll discover the drift through angry customers (also a valid path, just less pleasant).
Trade-offs
Hybrid search costs you:
- Three times the embedding compute at index time. Mitigated by batching and async pipelines, but it's real money.
- Three times the storage. Qdrant compresses well, but at 7.8M points × 3 × 512 floats, you're talking real disk.
- Complexity at query time. Three parallel calls, fusion, optional rerank. Latency is fine — fan-out is fast — but the failure surface is wider.
You don't need any of this until you have:
- A multilingual catalogue
- A query mix where BM25 alone is missing things you can measure
- Enough query volume that the fix justifies the engineering
If your catalogue is English-only and 10k SKUs, BM25 with a decent synonym table will out-perform you for a year. The day it stops, you'll know.
The thing I keep telling new hires
The search problem isn't the model. It's the chunking. It's the tokenizer. It's the drift between two services that were supposed to be the same version. The vectors are the easy part. The pipeline that keeps them in sync, day after day, with new products landing every minute — that's the part that takes a senior engineer.
The model is doing math on numbers it was handed. If the numbers are wrong, the math is wrong. That's it.
// while you're here
- 9 min read
Agentic AI is mostly while(true) with vibes
Production lessons from running autonomous agents in long-running loops, fallback patterns that actually work, and the day your agent decides to retry 47 times.
agentic-aiproductionengineering-lessons - 8 min read
React Native, the native driver, and the jank you can finally feel
When useNativeDriver actually buys you something, when it lies to you, and how to find the dropped frame that's making your scanner screen feel slow.
react-nativeperformancemobile