Hi Martin, When you write your own tokenizer/analyzer for this, you'll probably want to emit multiple tokens for words that have umlauts and such - one version with ä -> ae, the other with ä -> a perhaps.
As for stripping accents from characters, somebody posted ISOLatinFilter.java (I think that was the class name) a few months back. If you can contribute your Analyzers, that would be great, as we already have a small set of Analyzers in Lucene's contrib area in SVN. Otis --- Martin Rode <[EMAIL PROTECTED]> wrote: > Hello everybody, > > First of congrats for that great piece of software! > > > I am working on a Europe-wide project, where we have texts on more > than > one European language, namely French, German, and English. Having > tried > the German and the FrenchAnalyzer both are not satisfying for what I > need. > > The GermanAnalyer should do a classic German umlaut conversion: > > ä -> ae > ö -> oe > u -> ue > > It does ä->a, ö->o, ü->u. This is not useful. If a word appears like > "Oeffner" and i search for "Öffner", i dont find it! It does the > conversion right for "ß", which converts to "ss". > > For French I tried the FrenchAnalyzer, but it does not work (at least > > not the one in lukeall.jar, which is pretty up-to-date, I guess). > > Well, in short, it would be nice to have a simple Analyzer which does > > the great job of the StandardAnalyzer, PLUS a few extras for European > > languages, and that is pretty easy: > > For German: See above, > For French: Remove all the ` ´ ^ and the hook below the c > For Swedish, Polish, Czech ... remove everything which crosses, > slashes > or whatever the ascci letters a-z > > Before I write my own Analyzer for that I was wondering if anybody > has > had the same problems and found already a solution for that! > > > Thanks for your help! > > Take Care, > Martin > > > > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]