Rob Young wrote:
mark harwood wrote:
I'd be more inclined to guess that kylie->klyie falls
below the 0.5f similarity threshold you pass.
Try print out the results of
fuzzyQuery.rewrite(indexReader).toString();
This will rewrite the fuzzyQuery to a BooleanQuery
which explicitly lists the TermQuery objects that the
fuzzyQuery has found potential matches for in your
index.
Hey, thanks for the fuzzyQuery.rewrite tip, I'll try that out to see
what's going on. Regarding the theory about falling below the 0.5f
threshold, that's not the case because new FuzzyQuery( new Term( ...
), 0.5f ) on it's own matches. I'll see what I can find out with your
rewrite tip though :)
Ahahahaha!! Thank you, you were right after all. I didn't realize that
once you set the fuzzy prefix length the threshold only applies to the
_remainder_ of the string, which, of course, means that a search string
whose first letter matches by default has a lower similarity after the
fuzzy prefix length is applied.
I must say, this isn't explained particularly well in the docs (not that
I've explained it much better above).
Well, thanks all. My fuzzy results are still a little funny but at least
I have the prefix headache sorted.
One thing I was thinking of doing was checking the character frequency
and scoring on that somehow as well. IE klyie has one k, one l, one y
etc. as does kylie but katie (another one which matches on levenstein
alone) doesn't so klyie would rank higher. Has this been done before?
Would it be possible? If so where abouts should I look in "Lucene in
Action" or on the net?
Many thanks
Rob
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]