See below - smokey <[EMAIL PROTECTED]> wrote on 03/12/2007 05:14:23:
> Suppose I have an index containing the terms impostor, > imposter, fraud, and > fruad, then presumably regardless of whether I spell impostor and fraud > correctly, Lucene SpellChecker will offer the improperly > spelled versions as > corrections. This means that the phrase "The login fraud involves an > impostor" would need to expand to: > > "The login fraud involves an impostor" OR "The login fruad involves an > impostor" OR "The login fraud involves an imposter" OR "The login fruad > involves an imposter" to cover all cases and thus find all > possible matches. > > However, that feels like an aweful a lot of matches to perform > on the index. > A more efficient approach would be to expand the query to "The > login (fraud > OR fruad) involves an (impostor OR imposter)", which should be logically > equivalent to the first (longer) query. > > So my question is > (1) if others have generated the "The login (fraud OR fruad) involves an > (impostor OR imposter)" types of queries when applying SpellChecker to a > phrase, and agreed that this indeed performs better than the first one. > (2) if others have observed any problems in doing so in terms > of performance > or anything else > > Any information would be appreciated. Lucene phrase query does not support 'sub parts'. But you may want to look at o.a.l.search.spans. It seems that a span-near query made of span-term queries and span-or queries, setting (max)span as ~the length of your phrase and setting in-order=true would get pretty close. About performance I hope others can comment, cause I never compared this to phrase query. When you do try this, please tell us of any interesting performance results! Regards, Doron --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]