Thanks for the reply. > -----Original Message----- > From: Jack Krupansky [mailto:j...@basetechnology.com] > Sent: Tuesday, June 12, 2012 1:14 PM > To: java-user@lucene.apache.org > Subject: Re: Stemming - limited index expansion > > I don't completely follow precisely what you want to do, but the > WordDelimiterFilter is an example of a > token filter that outputs an extra token at the same position, such as with > its > CATENATE_ALL/WORDS/NUMBERS options.
Thanks for directing me to that. I'm currently using 3.4., it doesn't appear in the code base of 3.6. If it doesn't show up until 4.0+ (your link is actually 5.0!), I know that " Terms are no longer required to be character based. Lucene views a term as an arbitrary byte[]" -- https://builds.apache.org/job/Lucene-trunk/javadoc/changes/Changes.html#4.0.0-alpha.api_changes But hopefully it at the right level to suggest how would be done using the old CharRef instead of whatever the new stuff uses (ByteRef?). I'll take a look. > Maybe you simple want to internally call some existing stemmer filter and > output both the original and > stemmed term at the same location? Yes, that is very close to what I want to do, possibly only with the addition of only doing stemming on a limited set of all words (but more than just plurals). -Paul --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org