Porter is a little outdated I've found KStem much better http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters/Kstem
You'll still need a good protected word list, but KStem is just a little nicer On Fri, Jan 16, 2009 at 6:20 PM, David Woodward <d...@loc.gov> wrote: > Hi. > > Any good protwords.txt out there? > > In a fairly standard solr analyzer chain, we use the English Porter > analyzer like so: > > <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/> > > For most purposes the porter does just fine, but occasionally words come > along that really don't work out to well, e.g., > > "maine" is stemmed to "main" - clearly goofing up precision about "Maine" > without doing much good for variants of "main". > > So - I have an entry for my protwords.txt. What else should go in there? > > Thanks for your ideas, > > Dave Woodward > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >