Re: Applying SpellChecker to a phrase

Doron Cohen Mon, 03 Dec 2007 22:25:02 -0800

See below -

smokey <[EMAIL PROTECTED]> wrote on 03/12/2007 05:14:23:


> Suppose I have an index containing the terms impostor,
> imposter, fraud, and
> fruad, then presumably regardless of whether I spell impostor and fraud
> correctly, Lucene SpellChecker will offer the improperly
> spelled versions as
> corrections. This means that the phrase "The login fraud involves an
> impostor" would need to expand to:
>
> "The login fraud involves an impostor" OR "The login fruad involves an
> impostor" OR "The login fraud involves an imposter" OR "The login fruad
> involves an imposter" to cover all cases and thus find all
> possible matches.
>
> However, that feels like an aweful a lot of matches to perform
> on the index.
> A more efficient approach would be to expand the query to "The
> login (fraud
> OR fruad) involves an (impostor OR imposter)", which should be logically
> equivalent to the first (longer) query.
>
> So my question is
> (1) if others have generated the "The login (fraud OR fruad) involves an
> (impostor OR imposter)" types of queries when applying SpellChecker to a
> phrase, and agreed that this indeed performs better than the first one.
> (2) if others have observed any problems in doing so in terms
> of performance
> or anything else
>
> Any information would be appreciated.

Lucene phrase query does not support 'sub parts'. But you may
want to look at o.a.l.search.spans. It seems that a span-near query
made of span-term queries and span-or queries, setting (max)span as
~the length of your phrase and setting in-order=true would get
pretty close.

About performance I hope others can comment, cause I never compared
this to phrase query. When you do try this, please tell us of any
interesting performance results!

Regards,
Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Applying SpellChecker to a phrase

Reply via email to