Hi,
If some mispellings are very common, you could also turn them into synonyms.  I 
have not tried finding any information about this, but I *think* Google may be 
doing that.  I run a social service called Simpy at simpy.com and have Google 
Alerts for "simpy", but those alerts often contain matches on "simply".  So 
Google must account for people often misspelling the word "simply" as "simpy" 
and using synonyms is one way to handle this.  Look at the synonym stuff in 
Lucene in Action or even Solr for for to implement this type of stuff.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Karl Wettin <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Tuesday, January 15, 2008 8:40:44 AM
Subject: Re: spell checking for combined words


14 jan 2008 kl. 19.47 skrev solr_user:

> Does Lucene spell checker have the ability to suggest splitting of  
> combined
> words.  So for e.g. if I have got the word "apple" and "computer" in
  
> my
> index and if I type "applecomputer" then how can I make it suggest
> "apple computer"


It would probably be very expensive to analyze each query token and  
decompose it using some word list or index.

If "apple computer" is a phrase that is common in your query, then you
  
can create a new field with chained tokens: "think diffrent apple  
computer" -> "thinkdiffrent diffrentapple applecomputer". This would  
at least allow you to search for such typos. You might want to set  
some threadholds like min/max token size and such in your TokenFilter.


-- 
karl



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to