Özgür Işık Damar
ServicesService

Hybrid Search & Retrieval

Search that returns results isn't the same as search that finds things. The gap between the two is where catalogs quietly lose revenue — and it's measurable, which means it's fixable.

When you'd call me

  • Search "works" — no errors, results come back — but customers type a product they want and don't find items you actually stock.
  • You're choosing or tuning a vector database and the public benchmarks look nothing like your data.
  • Your catalog is multilingual — Turkish included — and tokenization or embedding behavior breaks in ways the docs never mention.
  • You want image-based search and nobody on the team can say whether CLIP, SigLIP or something else fits your catalog.

What I do

  • Hybrid retrieval — BM25, dense vectors and image embeddings fused with RRF, with the k parameter tuned against your queries instead of left at the default.
  • Reranker strategy — measuring when a cross-encoder pays for its latency and when plain BM25 already wins.
  • Embedding model selection with evidence — I've run CLIP against SigLIP head-to-head on a Turkish-heavy catalog and kept the receipts.
  • Eval harness setup — a labeled query set built from your real logs, so "better" becomes nDCG and recall instead of an opinion in a meeting.

Numbers, not adjectives

Nova's catalog: 7 million products, three retrieval signals fused with RRF, image search through SigLIP. nDCG@10 went from 0.61 to 0.74 measured against the real query distribution — and the eval harness is what made every tuning decision defensible instead of arguable.

Field notes

Where we'd start

Discovery: I evaluate your current search against your real query distribution — sample queries from the logs, label the results, measure. You get numbers on where it fails and a fix list ordered by impact, not by what's fashionable.