Couldn't you just mod the PorterStemmer class for your requirements? (we did and provided it a list of ignore words & phrases specific to our needs)
On Sat, Jan 9, 2010 at 4:00 AM, Jamie <ja...@stimulussoft.com> wrote: > Hi All > > Is there another stemmer we can use that is perhaps not as aggressive as the > Porter Stemmer. i.e. the stemming could remove ing's, er's, but not > something so significant as to convert ""Lowe's" to "Low" > > Thanks > > Jamie > > Will Murnane wrote: >> >> On Fri, Jan 8, 2010 at 16:27, Jamie <ja...@stimulussoft.com> wrote: >> >>> >>> Hi Ian / Will >>> >>> Thanks. Surely, the Porter Stemmer should not stem proper noun's. i.e. it >>> could check the capitalization of the first letter of a word and whether >>> or >>> not the word is the start of sentence. If so, it could choose not apply >>> any >>> stemming. Or am I completely out of whack? >>> >> >> Look again: you're downcasing the terms before the Porter filter ever >> sees them (which is, AIUI, necessary). You might do well to combine >> the tokenizing and downcasing step with some heuristic to find proper >> nouns and not downcase or stem them. >> >> Will >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > -- > Stimulus Software - MailArchiva > Email Archiving And Compliance > USA Tel: +1-713-343-8824 ext 100 > UK Tel: +44-20-80991035 ext 100 > Email: ja...@stimulussoft.com > Web: http://www.mailarchiva.com > To receive MailArchiva Enterprise Edition product announcements, send a > message to: <mailarchiva-enterprise-edition-subscr...@stimulussoft.com> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org