Re: Hybrid vector + lexical search in Solr 9.x - score normalization approach

Chip Ryan via users Wed, 03 Dec 2025 21:48:07 -0800

Hi Corrado / Everyone,

Glad you found it helpful! Happy to share some thoughts on mm tuning since it's 
been one of the trickier knobs to get right in our hybrid setups.

The classic struggle with mm is finding that sweet spot between being too 
strict and too permissive. Set it too high (like 100%) and you lose recall fast 
— users misspell one word or include an extra term and suddenly get zero 
results. Set it too low and you're drowning in noise.

We've had good luck with patterns like mm=2<-1 5<-2 8<80% which basically says: 
require all terms for short queries, then gradually relax as queries get 
longer. The logic being that longer queries often contain throwaway words the 
user doesn't consider essential.

Where it gets interesting is how mm interacts with pf, pf2, pf3 and ps. Even 
with a relaxed mm, you can still reward documents that contain the full phrase 
(or close to it) using phrase boosting. So a document matching 3 out of 5 terms 
might rank, but a document matching those same 3 terms as a phrase gets boosted 
above it.

We lean on pf2 and pf3 heavily — bigram and trigram matching catches partial 
phrase matches that pf alone would miss. And ps (phrase slop) gives you wiggle 
room for word order variations without losing the intent signal.

In hybrid scenarios, this matters because your vector results don't care about 
exact term matching — they're all about semantic similarity. So your lexical 
side needs to pull its weight on precision, and that's where well-tuned mm + 
phrase boosting shines.

Happy to dive deeper on any of this if useful!

Cheers,

Chip

Opensolr.com
Your Path to AI Search 
<https://opensolr.com/faq/view/web-crawler/46/Opensolr-Web-Crawler-Site-Search-Solution>
[email protected] <mailto:[email protected]>
https://opensolr.com <https://opensolr.com/>
VAT: RO-35410526

> On 3 Dec 2025, at 07:48, Corrado Fiore <[email protected]> wrote:
> 
> Hello Chip,
> 
> thank you SO MUCH for providing this write-up. This is gold for people like 
> me who are still learning the ropes of hybrid scoring.
> 
> Is there interest in a more detailed write-up on the mm tuning for hybrid 
> scenarios?
> 
> +1. I would certainly be interested in learning from your experience.
> 
> Cheers!
> Corrado
> 
> On 2 Dec 2025, at 8:49, Chip Ryan via users wrote:
> 
> Hi everyone,
> 
> I've been running OpenSolr.com (hosted Solr) for about 10 years, and recently 
> implemented hybrid dense vector + lexical search. Wanted to share what worked 
> and maybe get feedback from the community.
> 
> The problem:
> 
> When combining knn vector scores (0-1 range) with edismax scores (unbounded, 
> often 10-50+), naive addition doesn't work. Lexical dominates every time, 
> even when semantically wrong.
> 
> The solution:
> 
> Saturation normalization on lexical scores:
> 
>    {!func}sum(
> 
>   product($vector_weight, query($vectorQuery)),
>   product($lexical_weight, div(query($lexicalQuery), 
> sum(query($lexicalQuery), $k)))
> )
> The div(score, score + k) maps any lexical score to 0-1 range. With k=10:
> 
> lexical 10 → 0.50
> lexical 20 → 0.67
> lexical 50 → 0.83
> Now vectors and lexical compete fairly.
> 
> Other pieces:
> 
> Using paraphrase-multilingual-MiniLM-L12-v2 for embeddings (CPU, no GPU)
> mm="3<90% 5<75% 8<60% 12<50%" for minimum match tuning
> Emoji queries via emoji.demojize() before embedding (🔥 → "fire")
> Live demos with debug inspector:
> 
> I exposed the full debugQuery output so you can see exactly what's happening:
> 
> Cross-lingual (EN→RO): https://opensolr.com/search/dedeman?q=pellet+heater
> Emoji search: https://opensolr.com/search/vector?q=🔥 
> <https://opensolr.com/search/vector?q=%F0%9F%94%A5>
> Semantic matching: 
> https://opensolr.com/search/peilishop?q=stuff+to+wear+around+my+neck
> Click the Debug button on any search to see params, parsed query, and explain 
> output.
> 
> Questions for the community:
> 
> Anyone else doing hybrid scoring differently? Curious about other 
> normalization approaches.
> Is there interest in a more detailed write-up on the mm tuning for hybrid 
> scenarios?
> Any gotchas with knn + function queries I should watch for at scale?
> Happy to share more implementation details if useful.
> 
> Cheers, 
> 
> Chip 
> 
> OpenSolr.com
>

smime.p7s
Description: S/MIME cryptographic signature

Re: Hybrid vector + lexical search in Solr 9.x - score normalization approach

Reply via email to