Re: whats the correct way to do normalisation?

2006-11-07 Thread hans meiser
Hi, On Nov 6, 2006, at 11:27 AM, hans meiser wrote: >> public final Token next() throws java.io.IOException { >> final Token t = input.next(); >> if (t == null) >> return null; >> return new Token(removeAccents(t.termText()), t.startOffset(), >> t.en

Re: whats the correct way to do normalisation?

2006-11-06 Thread hans meiser
Hi, > Did you take a look at IsoLatin1AccentFilter ? It nearly do the same i need, but not perfectly. public final Token next() throws java.io.IOException { final Token t = input.next(); if (t == null) return null; return new Token(removeAccents(t.termText()), t.startO

whats the correct way to do normalisation?

2006-11-06 Thread hans meiser
Hi, Lucene indexes documents from 3 different countries here (English, German and French). I want to normalize some characters like umlauts. รค -> ae I did it in the following way: New Analyzer: public class SpecialCharsAnalyzer extends StandardAnalyzer { public SpecialCharsAnalyzer() {