Re: umlauts / diacritic expansion

2019-04-16 Thread Ralf Heyde
Ah sorry, Asciifolding for umlauts will result in ue/ae - ß/ss etc You could allow a distance of 1 or 2 given you use levenshtein distance - this might be close to what you need. Von meinem iPhone gesendet > Am 16.04.2019 um 20:08 schrieb Michael Sokolov : > > I'm learning how to index/search

RE: umlauts / diacritic expansion

2019-04-16 Thread Markus Jelsma
Hello Michael, For the case of normalizing ü to ue, take a look at the german normalizer [1]. Regards, Markus [1] https://lucene.apache.org/core/7_6_0/analyzers-common/org/apache/lucene/analysis/de/GermanNormalizationFilter.html -Original message- > From:Ralf Heyde > Sent: Tuesday

Re: umlauts / diacritic expansion

2019-04-16 Thread Ralf Heyde
Hey, Take a look at Asciifoldingfilter - this one is quite generic. Does this answer your question? Cheers Ralf Von meinem iPhone gesendet > Am 16.04.2019 um 20:08 schrieb Michael Sokolov : > > I'm learning how to index/search German today and understanding that > vowels with umlauts are conv

umlauts / diacritic expansion

2019-04-16 Thread Michael Sokolov
I'm learning how to index/search German today and understanding that vowels with umlauts are conventionally expanded into two ASCII characters, eg "für" -> "fuer", so people may search for the expanded form "fuer", but they might also search with the diacritic, and finally they might lazily search