Hybrid Search & Retrieval

Search that returns results isn't the same as search that finds things. The gap between the two is where catalogs quietly lose revenue — and it's measurable, which means it's fixable.

When you'd call me

Search "works" — no errors, results come back — but customers type a product they want and don't find items you actually stock.
You're choosing or tuning a vector database and the public benchmarks look nothing like your data.
Your catalog is multilingual — Turkish included — and tokenization or embedding behavior breaks in ways the docs never mention.
You want image-based search and nobody on the team can say whether CLIP, SigLIP or something else fits your catalog.

What I do

Hybrid retrieval — BM25, dense vectors and image embeddings fused with RRF, with the k parameter tuned against your queries instead of left at the default.
Reranker strategy — measuring when a cross-encoder pays for its latency and when plain BM25 already wins.
Embedding model selection with evidence — I've run CLIP against SigLIP head-to-head on a Turkish-heavy catalog and kept the receipts.
Eval harness setup — a labeled query set built from your real logs, so "better" becomes nDCG and recall instead of an opinion in a meeting.

Numbers, not adjectives

Nova's catalog: 7 million products, three retrieval signals fused with RRF, image search through SigLIP. nDCG@10 went from 0.61 to 0.74 measured against the real query distribution — and the eval harness is what made every tuning decision defensible instead of arguable.

Field notes

Hybrid search with Qdrant: what nobody tells you about BM25 + dense + imageThe architecture behind the 7-million-product index.Six months with RRF in production: what k=60 doesn't tell youWhat the default k value got right — and where it didn't.CLIP vs SigLIP for a Turkish product catalog: a brand-affinity ablationEmbedding choice settled by measurement, not preference.The week BM25 beat my cross-encoder — and why I kept the reranker anywayThe query class where the simplest ranker won.

Where we'd start

Discovery: I evaluate your current search against your real query distribution — sample queries from the logs, label the results, measure. You get numbers on where it fails and a fix list ordered by impact, not by what's fashionable.

Tell me about your search