You can get the unstemmed word by re-analysing the (hopefully stored somewhere) text. Look at the tokens emitted from the TokenStream and when you get to the one that matches the stemmed form you can use the token offset info to retrieve the unstemmed form from the original text.
Another option which avoids re-analysis is to store the TermVector with TermPositionVector info enabled. All the offsets are then stored in the index, rather than computed on-the-fly by an Analyzer. The highlighter in the sandbox can use both of these approaches to get the original forms. Cheers Mark ___________________________________________________________ Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]