Hi Ian / Will

Thanks. Surely, the Porter Stemmer should not stem proper noun's. i.e. it could check the capitalization of the first letter of a word and whether or not the word is the start of sentence. If so, it could choose not apply any stemming. Or am I completely out of whack?

Jamie


Ian Lea wrote:
Looks like PorterStemFilter converts "Lowe's" to low.  Not very surprising.

Options include

.  Drop the stemming

.  Index stemmed and non-stemmed variants and search both, maybe
boosting the non-stemmed variant.

If you really want exact matches only, you may also/instead want
untokenized fields.  Apostrophes etc can be a problem.  Look into what
analyzers do and use Luke to see what is indexed.

--
Ian.


On Fri, Jan 8, 2010 at 8:01 PM, Jamie <ja...@stimulussoft.com> wrote:
Hi There

We are trying to search for the exact word "Lowe's" across a large set of
indexed data. Our results include everything with "low" in it. Thus, we are
receiving a much larger data set that we expected. The data is indexing
using the analyzer:
          TokenStream result = new StandardTokenizer(reader);
          result = new StandardFilter(result);
          result = new LowerCaseFilter(result);
          result = new StopFilter(result, stopTable);
          result = new PorterStemFilter(result);
          return result;

Thanks

Jamie






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



--
Stimulus Software - MailArchiva
Email Archiving And Compliance
USA Tel: +1-713-343-8824 ext 100
UK Tel: +44-20-80991035 ext 100
Email:  ja...@stimulussoft.com
Web: http://www.mailarchiva.com
To receive MailArchiva Enterprise Edition product announcements, send a message to: 
<mailarchiva-enterprise-edition-subscr...@stimulussoft.com>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to