16 jan 2008 kl. 00.33 skrev solr_user:

I did try the Lucene SpellChecker. Currently the lucene SpellChecker does not have the ability to suggest splitting of combined words. Is there a
plan to add this capability to the Lucene SpellChecker any time soon?

Very few plans in this project, but feel free to search the issue tracker for spell checker related patches.

I also did not quite understand your idea of producing N-word shingles and
then indexing them with the SpellChecker.  How will this help the
SpellChecker to suggest splitting of words?

Similar as I suggested that you can create shinles in an alternative field from your souce data token streams in order to search for the typos, you could instread add the shingles to your Lucene contrib spell checker dictionary.

Lets say the text you index is "a b c d". Your standard text analysis creates the tokens "a", "b", "c" and "d". Create shingles: "ab", "bc" and "cd" and add these as words in the spellchecker suggesting the decomposed versions: "ab" => "a b", "bc" => "b c" and "cd" => "c d".

Try to limit what shingles you add to the dictionary or you will probably end up with a huge dictionary.

Please report back with performace of such implementation if you get around to it. It could be a great contribution.


--
karl



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to