Hi, If some mispellings are very common, you could also turn them into synonyms. I have not tried finding any information about this, but I *think* Google may be doing that. I run a social service called Simpy at simpy.com and have Google Alerts for "simpy", but those alerts often contain matches on "simply". So Google must account for people often misspelling the word "simply" as "simpy" and using synonyms is one way to handle this. Look at the synonym stuff in Lucene in Action or even Solr for for to implement this type of stuff.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: Karl Wettin <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Tuesday, January 15, 2008 8:40:44 AM Subject: Re: spell checking for combined words 14 jan 2008 kl. 19.47 skrev solr_user: > Does Lucene spell checker have the ability to suggest splitting of > combined > words. So for e.g. if I have got the word "apple" and "computer" in > my > index and if I type "applecomputer" then how can I make it suggest > "apple computer" It would probably be very expensive to analyze each query token and decompose it using some word list or index. If "apple computer" is a phrase that is common in your query, then you can create a new field with chained tokens: "think diffrent apple computer" -> "thinkdiffrent diffrentapple applecomputer". This would at least allow you to search for such typos. You might want to set some threadholds like min/max token size and such in your TokenFilter. -- karl --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]