16 jan 2008 kl. 00.33 skrev solr_user:
I did try the Lucene SpellChecker. Currently the lucene
SpellChecker does
not have the ability to suggest splitting of combined words. Is
there a
plan to add this capability to the Lucene SpellChecker any time soon?
Very few plans in this project, but feel free to search the issue
tracker for spell checker related patches.
I also did not quite understand your idea of producing N-word
shingles and
then indexing them with the SpellChecker. How will this help the
SpellChecker to suggest splitting of words?
Similar as I suggested that you can create shinles in an alternative
field from your souce data token streams in order to search for the
typos, you could instread add the shingles to your Lucene contrib
spell checker dictionary.
Lets say the text you index is "a b c d". Your standard text analysis
creates the tokens "a", "b", "c" and "d". Create shingles: "ab", "bc"
and "cd" and add these as words in the spellchecker suggesting the
decomposed versions: "ab" => "a b", "bc" => "b c" and "cd" => "c d".
Try to limit what shingles you add to the dictionary or you will
probably end up with a huge dictionary.
Please report back with performace of such implementation if you get
around to it. It could be a great contribution.
--
karl
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]