On Jul 8, 2005, at 8:44 AM, mark harwood wrote:
You can get the unstemmed word by re-analysing the (hopefully stored somewhere) text. Look at the tokens emitted from the TokenStream and when you get to the one that matches the stemmed form you can use the token offset info to retrieve the unstemmed form from the original text.
Wouldn't that fall down if you had two distinct terms which produce the same string when stemmed?
Best, Marvin Humphrey Rectangular Research http://www.rectangular.com/ --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]