On May 22, 2013, at 7:08 PM, Karl Wettin <karl.wet...@kodapan.se> wrote:
>> * Use a filter after ASCIIFoldingFilter that discriminate all use of ae, oe, >> oo, and other combination of double vowels, just keeping the first one. > > I ended up with that solution. > > https://issues.apache.org/jira/browse/LUCENE-5013 Interesting problem… perhaps you could generalize your solution a bit… for example, in, say, German, one could substitute 'ue' for 'ü', etc… so it looks like what you are after is folding double vowels… irrespectively of how they got there… So… assuming something along the lines of Sean M. Burke Unidecode [1] for the purpose of ASCII transliteration, what's left is simply to fold double vowels, e.g.: print( 1, Unidecode( 'blåbærsyltetøj' ):lower():gsub( '([aeiou]?)([aeiou]?)', '%1' ) ) print( 2, Unidecode( 'blåbärsyltetöj' ):lower():gsub( '([aeiou]?)([aeiou]?)', '%1' ) ) print( 3, Unidecode( 'blaabaarsyltetoej' ):lower():gsub( '([aeiou]?)([aeiou]?)', '%1' ) ) print( 4, Unidecode( 'blabarsyltetoj' ):lower():gsub( '([aeiou]?)([aeiou]?)', '%1' ) ) print( 5, Unidecode( 'Räksmörgås' ):lower():gsub( '([aeiou]?)([aeiou]?)', '%1' ) ) print( 6, Unidecode( 'Göteborg' ):lower():gsub( '([aeiou]?)([aeiou]?)', '%1' ) ) print( 7, Unidecode( 'Gøteborg' ):lower():gsub( '([aeiou]?)([aeiou]?)', '%1' ) ) print( 8, Unidecode( 'Über' ):lower():gsub( '([aeiou]?)([aeiou]?)', '%1' ) ) print( 9, Unidecode( 'ueber' ):lower():gsub( '([aeiou]?)([aeiou]?)', '%1' ) ) print( 10, Unidecode( 'uber' ):lower():gsub( '([aeiou]?)([aeiou]?)', '%1' ) ) print( 11, Unidecode( 'uuber' ):lower():gsub( '([aeiou]?)([aeiou]?)', '%1' ) ) > 1 blabarsyltetoj > 2 blabarsyltetoj > 3 blabarsyltetoj > 4 blabarsyltetoj > 5 raksmorgas > 6 goteborg > 7 goteborg > 8 uber > 9 uber > 10 uber > 11 uber [1] http://search.cpan.org/~sburke/Text-Unidecode-0.04/lib/Text/Unidecode.pm --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org