Re: Funny results with Fuzzy

mark harwood Tue, 25 Oct 2005 10:14:20 -0700

> One thing I was thinking of doing was checking the
> character frequency


An alternative idea is index-time fuzzification rather
than query-time. This is documented in one of the case
studies in LIA - the principle is you don't
index/search for whole words but use an NGram Analyzer
to break them up at index time:

Kylie becomes multiple words:
[ k]
[ ky]
[ kyl]
[ky]
[kyl]
[kyli]
[yl]
[yli]
[ylie]
[ kylie ]

Obviously you use the same analyzer to process
queries.
Lucene will automatically look after relevancy of
partial matches for you but your indexes are bigger
and your queries will generate many more Boolean
clauses.





        
        
                
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail 
http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Funny results with Fuzzy

Reply via email to