Hello Chip,

thank you SO MUCH for providing this write-up. This is gold for people like me who are still learning the ropes of hybrid scoring.

Is there interest in a more detailed write-up on the mm tuning for hybrid scenarios?

+1.  I would certainly be interested in learning from your experience.

Cheers!
Corrado

On 2 Dec 2025, at 8:49, Chip Ryan via users wrote:

Hi everyone,

I've been running OpenSolr.com (hosted Solr) for about 10 years, and recently implemented hybrid dense vector + lexical search. Wanted to share what worked and maybe get feedback from the community.

The problem:

When combining knn vector scores (0-1 range) with edismax scores (unbounded, often 10-50+), naive addition doesn't work. Lexical dominates every time, even when semantically wrong.

The solution:

Saturation normalization on lexical scores:

   {!func}sum(

  product($vector_weight, query($vectorQuery)),
product($lexical_weight, div(query($lexicalQuery), sum(query($lexicalQuery), $k)))
)
The div(score, score + k) maps any lexical score to 0-1 range. With k=10:

lexical 10 β†’ 0.50
lexical 20 β†’ 0.67
lexical 50 β†’ 0.83
Now vectors and lexical compete fairly.

Other pieces:

Using paraphrase-multilingual-MiniLM-L12-v2 for embeddings (CPU, no GPU)
mm="3<90% 5<75% 8<60% 12<50%" for minimum match tuning
Emoji queries via emoji.demojize() before embedding (πŸ”₯ β†’ "fire")
Live demos with debug inspector:

I exposed the full debugQuery output so you can see exactly what's happening:

Cross-lingual (ENβ†’RO): https://opensolr.com/search/dedeman?q=pellet+heater Emoji search: https://opensolr.com/search/vector?q=πŸ”₯ <https://opensolr.com/search/vector?q=%F0%9F%94%A5> Semantic matching: https://opensolr.com/search/peilishop?q=stuff+to+wear+around+my+neck Click the Debug button on any search to see params, parsed query, and explain output.

Questions for the community:

Anyone else doing hybrid scoring differently? Curious about other normalization approaches. Is there interest in a more detailed write-up on the mm tuning for hybrid scenarios?
Any gotchas with knn + function queries I should watch for at scale?
Happy to share more implementation details if useful.

Cheers,

Chip

OpenSolr.com


Reply via email to