Re: spell checking for combined words

solr_user Tue, 15 Jan 2008 15:33:34 -0800

I did try the Lucene SpellChecker.  Currently the lucene SpellChecker does
not have the ability to suggest splitting of combined words.  Is there a
plan to add this capability to the Lucene SpellChecker any time soon?


I also did not quite understand your idea of producing N-word shingles and
then indexing them with the SpellChecker.  How will this help the
SpellChecker to suggest splitting of words?




Otis Gospodnetic wrote:
> 
> Have you tried the Lucene spellchecker first?  I think it could be adapted
> to do want, esp with the help of LUCENE-400 to produce N-word shingles
> (which you can then index with the Spellchecker).  I'm quite sure this
> could be done, in fact, and would be a nice addition to Spellchecker in
> general.
> 
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> 
> ----- Original Message ----
> From: solr_user <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Tuesday, January 15, 2008 1:14:06 PM
> Subject: Re: spell checking for combined words
> 
> 
> I don't have a list of common "combined word" queries.  Splitting of
>  words
> seem to be quite a standard thing, most search engines and spell
>  checkers
> have this ability.  It would be nice if Lucene provides this out of the
>  box.
> 
> 
> karl wettin-3 wrote:
>> 
>> 
>> 14 jan 2008 kl. 19.47 skrev solr_user:
>> 
>>> Does Lucene spell checker have the ability to suggest splitting of  
>>> combined
>>> words.  So for e.g. if I have got the word "apple" and "computer" in
>   
>>> my
>>> index and if I type "applecomputer" then how can I make it suggest
>>> "apple computer"
>> 
>> 
>> It would probably be very expensive to analyze each query token and  
>> decompose it using some word list or index.
>> 
>> If "apple computer" is a phrase that is common in your query, then
>  you  
>> can create a new field with chained tokens: "think diffrent apple  
>> computer" -> "thinkdiffrent diffrentapple applecomputer". This would
>   
>> at least allow you to search for such typos. You might want to set  
>> some threadholds like min/max token size and such in your
>  TokenFilter.
>> 
>> 
>> -- 
>> karl
>> 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>> 
>> 
>> 
> 
> -- 
> View this message in context:
> 
> http://www.nabble.com/spell-checking-for-combined-words-tp14809197p14843700.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/spell-checking-for-combined-words-tp14809197p14853050.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: spell checking for combined words

Reply via email to