> Is there another stemmer we can use that is perhaps not as
> aggressive as the Porter Stemmer.
"KStem is an alternative to Porter for developers looking for a less agressive
stemmer. It was written by Bob Krovetz, ported to Lucene by Sergio Guzman-Lara
(UMASS Amherst)." [1]
[1]http://wiki.ap
Couldn't you just mod the PorterStemmer class for your requirements?
(we did and provided it a list of ignore words & phrases specific to
our needs)
On Sat, Jan 9, 2010 at 4:00 AM, Jamie wrote:
> Hi All
>
> Is there another stemmer we can use that is perhaps not as aggressive as the
> Porter Stem
Hi All
Is there another stemmer we can use that is perhaps not as aggressive as
the Porter Stemmer. i.e. the stemming could remove ing's, er's, but not
something so significant as to convert ""Lowe's" to "Low"
Thanks
Jamie
Will Murnane wrote:
On Fri, Jan 8, 2010 at 16:27, Jamie wrote:
On Fri, Jan 8, 2010 at 16:27, Jamie wrote:
> Hi Ian / Will
>
> Thanks. Surely, the Porter Stemmer should not stem proper noun's. i.e. it
> could check the capitalization of the first letter of a word and whether or
> not the word is the start of sentence. If so, it could choose not apply any
> ste
Hi Ian / Will
Thanks. Surely, the Porter Stemmer should not stem proper noun's. i.e.
it could check the capitalization of the first letter of a word and
whether or not the word is the start of sentence. If so, it could choose
not apply any stemming. Or am I completely out of whack?
Jamie
I
Looks like PorterStemFilter converts "Lowe's" to low. Not very surprising.
Options include
. Drop the stemming
. Index stemmed and non-stemmed variants and search both, maybe
boosting the non-stemmed variant.
If you really want exact matches only, you may also/instead want
untokenized fields
On Fri, Jan 8, 2010 at 15:01, Jamie wrote:
> Hi There
>
> We are trying to search for the exact word "Lowe's" across a large set of
> indexed data. Our results include everything with "low" in it. Thus, we are
> receiving a much larger data set that we expected. The data is indexing
> using the an