Re: Fuzzy Query Similarity

2022-07-12 Thread Mike Drob
On Mon, Jul 11, 2022 at 3:36 PM Mike Drob wrote: > Hi Uwe, thanks for all the pointers! > > I tried using BooleanSimilarity and the resulting scores were even more > divergent! 1.0 for the exact match vs 1.55 (= 0.8 + 0.75) for the multiple > terms that were close. Which makes sense with ignoring

Re: Fuzzy Query Similarity

2022-07-11 Thread Mike Drob
Hi Uwe, thanks for all the pointers! I tried using BooleanSimilarity and the resulting scores were even more divergent! 1.0 for the exact match vs 1.55 (= 0.8 + 0.75) for the multiple terms that were close. Which makes sense with ignoring TF but still doesn't help me down-boost the other terms.

Re: Fuzzy Query Similarity

2022-07-09 Thread Michael Sokolov
Oh good! Thanks for clarifying, Uwe On Sat, Jul 9, 2022, 12:23 PM Uwe Schindler wrote: > Hi > > FuzzyQuery/MultiTermQuery and I don't see any way to "boost" exact > > matches, or even to incorporate the edit distance more generally into > > the per-term score, although it does seem like that wou

Re: Fuzzy Query Similarity

2022-07-09 Thread Uwe Schindler
Hi FuzzyQuery/MultiTermQuery and I don't see any way to "boost" exact matches, or even to incorporate the edit distance more generally into the per-term score, although it does seem like that would be something people would generally expect. Actually it does this: * By default FuzzyQuery uses

Re: Fuzzy Query Similarity

2022-07-09 Thread Uwe Schindler
The problem is that the query combines the native termquery score (which depends on length of document and term's statistic). The edit distance is also multiplied in. When the difference in term statistics is too large, the edit distance no longer matters. This is perfectly fine and also happen

Re: Fuzzy Query Similarity

2022-07-09 Thread Michael Sokolov
I am no expert with this, but I got curious and looked at FuzzyQuery/MultiTermQuery and I don't see any way to "boost" exact matches, or even to incorporate the edit distance more generally into the per-term score, although it does seem like that would be something people would generally expect. So

Fuzzy Query Similarity

2022-07-08 Thread Mike Drob
Hi folks, I'm working with some fuzzy queries and trying my best to understand what is the expected behaviour of the searcher. I'm not sure if this is a similarity bug or an incorrect usage on my end. The problem is when I do a fuzzy search for a term "spark~" then instead of matching documents w