Re: Funny results with Fuzzy

Rob Young Tue, 25 Oct 2005 09:56:56 -0700

Rob Young wrote:

mark harwood wrote:
I'd be more inclined to guess that kylie->klyie falls
below the 0.5f similarity threshold you pass.

Try print out the results of
fuzzyQuery.rewrite(indexReader).toString();

This will rewrite the fuzzyQuery to a BooleanQuery
which explicitly lists the TermQuery objects that the
fuzzyQuery has found potential matches for in your
index.
Hey, thanks for the fuzzyQuery.rewrite tip, I'll try that out to seewhat's going on. Regarding the theory about falling below the 0.5fthreshold, that's not the case because new FuzzyQuery( new Term( ...), 0.5f ) on it's own matches. I'll see what I can find out with yourrewrite tip though :)

Ahahahaha!! Thank you, you were right after all. I didn't realize thatonce you set the fuzzy prefix length the threshold only applies to the_remainder_ of the string, which, of course, means that a search stringwhose first letter matches by default has a lower similarity after thefuzzy prefix length is applied.

I must say, this isn't explained particularly well in the docs (not thatI've explained it much better above).

Well, thanks all. My fuzzy results are still a little funny but at leastI have the prefix headache sorted.

One thing I was thinking of doing was checking the character frequencyand scoring on that somehow as well. IE klyie has one k, one l, one yetc. as does kylie but katie (another one which matches on levensteinalone) doesn't so klyie would rank higher. Has this been done before?Would it be possible? If so where abouts should I look in "Lucene inAction" or on the net?


Many thanks
Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Funny results with Fuzzy

Reply via email to