Just thought I'd comment since I had to do word processing before indexing in my application as well. Matt's method is pretty similar to what I did. I wrote a filter that transforms the tokens as they get indexed (and also use that for searching). Since I am indexing a block of words, rather than one document per word, I store the word in its original form in the payload of the token so I can retrieve it from the search. If your documents contain one word each, then Matt's suggestion of a stored field would be the right way to go. If not, I would suggest using payloads.
- Greg