Snowball and accents filter...?

Andrew Green Thu, 26 Apr 2007 12:41:14 -0700

Hi, all,

Another quick request succinct for code examples, or an explanation of
what we're doing wrong here.


We've successfully gotten the Snowball Spanish stemmer working in our
test harness. An example that works perfectly: texts that contain
"civilización" or "civilizaciones" produce hits on searches for either
"civilización" or "civilizaciones". However...

...it's quite likely that in many of our search requests the user will
omit the accents on letters, and it's not impossible that some documents
will contain misspelled words with wrong or missing accents.

So we in addition to stemming we need to remove accents from both the
index and the search queries... I think.

In order to do this, we tried subclassing the SnowballAnalyzer... it
doesn't work yet, though. Here is the code of our custom class:
        
        
        public class SnowballAnalyzerWithoutAccents extends SnowballAnalyzer {
        
                public SnowballAnalyzerWithoutAccents(String name, String[] 
stopWords) {
                        super(name, stopWords);
                }
        
                public TokenStream tokenStream(String fieldName, Reader reader) 
{
                        TokenStream result = super.tokenStream(fieldName, 
reader);
                        result = new ISOLatin1AccentFilter(result);
                        return result;
                }
        
        }
        
Basically we create an instance of this class and use it when we create
the IndexWriter and QueryParser objects.

Any tips/code examples? What should we do differently?

Many thanks in advance,
Andrew Green


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Snowball and accents filter...?

Reply via email to