Re: Lucene - Search breadth approach

2009-07-22 Thread Shai Erera
I don't use the Lucene stemming Analyzers. My version, if asked to keep the original tokens, sets the position of both stem and original to be the same, and adds another character to the stem version. During query, that Analyzer is usually instructed to not keep the original tokens, just the stems

Re: Lucene - Search breadth approach

2009-07-22 Thread Erick Erickson
But as far as I know, it doesn't index the original termtoo (at the same offset), which you have to do if you want to distinguish between the two cases, I think. But I confess I've been out of the guts of Lucene for some time, so I could be way off. But you'd sure want to use a different toke

Re: Lucene - Search breadth approach

2009-07-22 Thread Shai Erera
Actually my stemming Analyzer adds a similar character to stems, to distinguish between original tokens (like orig=test) to stems (testing --> test$). On Wed, Jul 22, 2009 at 11:02 PM, Erick Erickson wrote: > A closely related approach to what Shai outlined is to index the > *original*token > wit

Re: Lucene - Search breadth approach

2009-07-22 Thread Erick Erickson
A closely related approach to what Shai outlined is to index the *original*token with a special ender (say $) with a 0 increment (see SynonymAnalyzer in LIA). Then, whenever you determined you wanted to use the un-stemmed version, just add your token to the terms (i.e. testing$ when you didn't want

Re: Lucene - Search breadth approach

2009-07-22 Thread Shai Erera
Hi Robert, What you could do is use the Stemmer (as a TokenFilter I assume) and produce two tokens always - the stem and the original. Index both of them in the same position. Then tell your users that if they search for [testing], it will find results for 'testing', 'test' etc (the stems) and if