I did try the Lucene SpellChecker. Currently the lucene SpellChecker does not have the ability to suggest splitting of combined words. Is there a plan to add this capability to the Lucene SpellChecker any time soon?
I also did not quite understand your idea of producing N-word shingles and then indexing them with the SpellChecker. How will this help the SpellChecker to suggest splitting of words? Otis Gospodnetic wrote: > > Have you tried the Lucene spellchecker first? I think it could be adapted > to do want, esp with the help of LUCENE-400 to produce N-word shingles > (which you can then index with the Spellchecker). I'm quite sure this > could be done, in fact, and would be a nice addition to Spellchecker in > general. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > From: solr_user <[EMAIL PROTECTED]> > To: java-user@lucene.apache.org > Sent: Tuesday, January 15, 2008 1:14:06 PM > Subject: Re: spell checking for combined words > > > I don't have a list of common "combined word" queries. Splitting of > words > seem to be quite a standard thing, most search engines and spell > checkers > have this ability. It would be nice if Lucene provides this out of the > box. > > > karl wettin-3 wrote: >> >> >> 14 jan 2008 kl. 19.47 skrev solr_user: >> >>> Does Lucene spell checker have the ability to suggest splitting of >>> combined >>> words. So for e.g. if I have got the word "apple" and "computer" in > >>> my >>> index and if I type "applecomputer" then how can I make it suggest >>> "apple computer" >> >> >> It would probably be very expensive to analyze each query token and >> decompose it using some word list or index. >> >> If "apple computer" is a phrase that is common in your query, then > you >> can create a new field with chained tokens: "think diffrent apple >> computer" -> "thinkdiffrent diffrentapple applecomputer". This would > >> at least allow you to search for such typos. You might want to set >> some threadholds like min/max token size and such in your > TokenFilter. >> >> >> -- >> karl >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> >> > > -- > View this message in context: > > http://www.nabble.com/spell-checking-for-combined-words-tp14809197p14843700.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/spell-checking-for-combined-words-tp14809197p14853050.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]