ServicesService
Hybrid Search & Retrieval
Search that returns results isn't the same as search that finds things. The gap between the two is where catalogs quietly lose revenue — and it's measurable, which means it's fixable.
When you'd call me
- Search "works" — no errors, results come back — but customers type a product they want and don't find items you actually stock.
- You're choosing or tuning a vector database and the public benchmarks look nothing like your data.
- Your catalog is multilingual — Turkish included — and tokenization or embedding behavior breaks in ways the docs never mention.
- You want image-based search and nobody on the team can say whether CLIP, SigLIP or something else fits your catalog.
What I do
- Hybrid retrieval — BM25, dense vectors and image embeddings fused with RRF, with the k parameter tuned against your queries instead of left at the default.
- Reranker strategy — measuring when a cross-encoder pays for its latency and when plain BM25 already wins.
- Embedding model selection with evidence — I've run CLIP against SigLIP head-to-head on a Turkish-heavy catalog and kept the receipts.
- Eval harness setup — a labeled query set built from your real logs, so "better" becomes nDCG and recall instead of an opinion in a meeting.
Numbers, not adjectives
Nova's catalog: 7 million products, three retrieval signals fused with RRF, image search through SigLIP. nDCG@10 went from 0.61 to 0.74 measured against the real query distribution — and the eval harness is what made every tuning decision defensible instead of arguable.
Field notes
Hybrid search with Qdrant: what nobody tells you about BM25 + dense + imageThe architecture behind the 7-million-product index.Six months with RRF in production: what k=60 doesn't tell youWhat the default k value got right — and where it didn't.CLIP vs SigLIP for a Turkish product catalog: a brand-affinity ablationEmbedding choice settled by measurement, not preference.The week BM25 beat my cross-encoder — and why I kept the reranker anywayThe query class where the simplest ranker won.
Where we'd start
Discovery: I evaluate your current search against your real query distribution — sample queries from the logs, label the results, measure. You get numbers on where it fails and a fix list ordered by impact, not by what's fashionable.