On Tue, 23 Feb 2010, Rafael Capdevielle wrote:

I want to ask if there is any predefine way to strip plurals for spanish
language.

I already have a StopSpanishAnalyzer for Stop words:

class StopSpanishAnalyzer(lucene.
PythonAnalyzer):

   def __init__(self, stopWords):
       super(StopSpanishAnalyzer, self).__init__()
       self.stopWords = stopWords

   def tokenStream(self, fieldName, reader):
       result = lucene.StandardTokenizer(reader)
       result = lucene.LowerCaseFilter(result)
       result = lucene.StopFilter(result, self.stopWords)
       result = lucene.ISOLatin1AccentFilter(result)

Yes, add something like:

         result = lucene.SnowballFilter(result, lucene.SpanishStemmer())

The Snowball contrib is normally built along with PyLucene.

Andi..

       return result

is there any way to modify this Analyzer to use the SpanishStemer or another
solution to strip plurals in order to improved accuracy of queries?

Thanks!

Rafael

Reply via email to