22 maj 2013 kl. 20:29 skrev Petite Abeille: > > On May 22, 2013, at 7:08 PM, Karl Wettin <karl.wet...@kodapan.se> wrote: > >>> * Use a filter after ASCIIFoldingFilter that discriminate all use of ae, >>> oe, oo, and other combination of double vowels, just keeping the first one. >> >> I ended up with that solution. >> >> https://issues.apache.org/jira/browse/LUCENE-5013 > > Interesting problem… perhaps you could generalize your solution a bit… for > example, in, say, German, one could substitute 'ue' for 'ü', etc… so it looks > like what you are after is folding double vowels… irrespectively of how they > got there… > > So… assuming something along the lines of Sean M. Burke Unidecode [1] for the > purpose of ASCII transliteration, what's left is simply to fold double > vowels, e.g.:
I pasted your reply as a comment in the JIRA-issue. Hmmm interesting thought though. I have to consider if it make sense to make it this generic. I think it might be problematic for some languages though, especially Dutch. karl --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org