Hi Boris,
Query parsing and scoring/ranking are completely separated processes
so I'd debug those problems separately.
For debugging fuzzy query, Query.rewrite() method would be a good
first step (by which you can see all unrolled terms generated by fuzzy
query).
I'm not sure about what is your pr
Hi Boris,
"Acer campestre 'Rozi'" now receives a higher score with DFISimilarity
and BM25Similarity (with tuned 'b') instead of the standard BM25.
It really iswas a scoring/normalization issue: While "Rozi" gets a
higher score, "Acer" and "campestere" received lower values and the
combined result
These are great suggestions, i was going to suggest explain plan of
query, too.
i really wonder in Your case why 'Rozi' entry does not get higher score.
Is there any effect from " ' " chars?
In my case i have sort of reverse situation:
my query is maink~2 (mains was a special case where i st
Hi Namgyu and Tomoko,
your hint towards Explanation was very helpful and I was not aware of
this feature.
I have now experimented with different scoring functions and it seems
that DFISimilarity and BM25Similarity (with lower 'b') produce results
in the direction I prefer, though not perfect for
Hi Matthias,
What similarity class are you using.
Just a guess... but possibly one reason is document (field) length
normalization. Generally speaking shorter documents would get higher
scores than longer documents. (I saw that classic TFIDF similarity
tends to give much higher scores to shorter
Dear Matthias,
First you need to know about the Lucene's ranking concept.
Lucene's basic ranking is BM25 and it depends on your index status.
(https://en.wikipedia.org/wiki/Okapi_BM25)
There can be many reasons.
One of thing that I can guess is your index has a lot of 'rozi' term so it
is getting
i would suggest trying (indexing and searching) without === ' === s and
see You can find it first.
Thanks
On 6/13/19 11:25 AM, Matthias Müller wrote:
I am currently matching botanic names (with possible mis-spellings)
against an indexed referenced list with Lucene. After quick progress in
the
I am currently matching botanic names (with possible mis-spellings)
against an indexed referenced list with Lucene. After quick progress in
the beginning, I am struggeling with the proper query design to achieve
a ranking result I want.
Here is an example:
Search term: Acer campestre 'Rozi'
Toke