Hi Ian / Will
Thanks. Surely, the Porter Stemmer should not stem proper noun's. i.e.
it could check the capitalization of the first letter of a word and
whether or not the word is the start of sentence. If so, it could choose
not apply any stemming. Or am I completely out of whack?
Jamie
Ian Lea wrote:
Looks like PorterStemFilter converts "Lowe's" to low. Not very surprising.
Options include
. Drop the stemming
. Index stemmed and non-stemmed variants and search both, maybe
boosting the non-stemmed variant.
If you really want exact matches only, you may also/instead want
untokenized fields. Apostrophes etc can be a problem. Look into what
analyzers do and use Luke to see what is indexed.
--
Ian.
On Fri, Jan 8, 2010 at 8:01 PM, Jamie <ja...@stimulussoft.com> wrote:
Hi There
We are trying to search for the exact word "Lowe's" across a large set of
indexed data. Our results include everything with "low" in it. Thus, we are
receiving a much larger data set that we expected. The data is indexing
using the analyzer:
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new StopFilter(result, stopTable);
result = new PorterStemFilter(result);
return result;
Thanks
Jamie
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
--
Stimulus Software - MailArchiva
Email Archiving And Compliance
USA Tel: +1-713-343-8824 ext 100
UK Tel: +44-20-80991035 ext 100
Email: ja...@stimulussoft.com
Web: http://www.mailarchiva.com
To receive MailArchiva Enterprise Edition product announcements, send a message to:
<mailarchiva-enterprise-edition-subscr...@stimulussoft.com>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org