Rob Young wrote:

mark harwood wrote:

I'd be more inclined to guess that kylie->klyie falls
below the 0.5f similarity threshold you pass.

Try print out the results of
fuzzyQuery.rewrite(indexReader).toString();

This will rewrite the fuzzyQuery to a BooleanQuery
which explicitly lists the TermQuery objects that the
fuzzyQuery has found potential matches for in your
index.
Hey, thanks for the fuzzyQuery.rewrite tip, I'll try that out to see what's going on. Regarding the theory about falling below the 0.5f threshold, that's not the case because new FuzzyQuery( new Term( ... ), 0.5f ) on it's own matches. I'll see what I can find out with your rewrite tip though :)

Ahahahaha!! Thank you, you were right after all. I didn't realize that once you set the fuzzy prefix length the threshold only applies to the _remainder_ of the string, which, of course, means that a search string whose first letter matches by default has a lower similarity after the fuzzy prefix length is applied.

I must say, this isn't explained particularly well in the docs (not that I've explained it much better above).

Well, thanks all. My fuzzy results are still a little funny but at least I have the prefix headache sorted.

One thing I was thinking of doing was checking the character frequency and scoring on that somehow as well. IE klyie has one k, one l, one y etc. as does kylie but katie (another one which matches on levenstein alone) doesn't so klyie would rank higher. Has this been done before? Would it be possible? If so where abouts should I look in "Lucene in Action" or on the net?

Many thanks
Rob

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to