Re: Best fuzzy match on multiple terms

2019-06-14 Thread Tomoko Uchida
Hi Boris, Query parsing and scoring/ranking are completely separated processes so I'd debug those problems separately. For debugging fuzzy query, Query.rewrite() method would be a good first step (by which you can see all unrolled terms generated by fuzzy query). I'm not sure about what is your pr

Re: Best fuzzy match on multiple terms

2019-06-14 Thread Matthias Müller
Hi Boris, "Acer campestre 'Rozi'" now receives a higher score with DFISimilarity and BM25Similarity (with tuned 'b') instead of the standard BM25. It really iswas a scoring/normalization issue: While "Rozi" gets a higher score, "Acer" and "campestere" received lower values and the combined result

Re: Can I use DFISimilarity for search on an an index written with BM25Similarity ?

2019-06-14 Thread Adrien Grand
Yes, you can use DFISimilarity with an index constructed with BM25Similarity. No need to reindex. On Fri, Jun 14, 2019 at 1:05 PM Frédéric Glorieux wrote: > > Hi, > > I'm working on literature texts (French). > > My users are interested in relevance tweaking to have the most suggested > texts (fo

Re: Best fuzzy match on multiple terms

2019-06-14 Thread baris . kazar
These are great suggestions, i was going to suggest explain plan of query, too. i really wonder in Your case why 'Rozi' entry does not get higher score. Is there any effect from " ' " chars? In my case i have sort of reverse situation: my query is maink~2 (mains was a special case where i st

Can I use DFISimilarity for search on an an index written with BM25Similarity ?

2019-06-14 Thread Frédéric Glorieux
Hi, I'm working on literature texts (French). My users are interested in relevance tweaking to have the most suggested texts (for their taste) in top results. Change similarity at query time is less expensive than reindex all. I checked that BM25 needs to write “norms“ to keep document lengt

Re: Best fuzzy match on multiple terms

2019-06-14 Thread Matthias Müller
Hi Namgyu and Tomoko, your hint towards Explanation was very helpful and I was not aware of this feature. I have now experimented with different scoring functions and it seems that DFISimilarity and BM25Similarity (with lower 'b') produce results in the direction I prefer, though not perfect for

Re: Best fuzzy match on multiple terms

2019-06-14 Thread Tomoko Uchida
Hi Matthias, What similarity class are you using. Just a guess... but possibly one reason is document (field) length normalization. Generally speaking shorter documents would get higher scores than longer documents. (I saw that classic TFIDF similarity tends to give much higher scores to shorter