22 maj 2013 kl. 20:29 skrev Petite Abeille:

> 
> On May 22, 2013, at 7:08 PM, Karl Wettin <karl.wet...@kodapan.se> wrote:
> 
>>> * Use a filter after ASCIIFoldingFilter that discriminate all use of ae, 
>>> oe, oo, and other combination of double vowels, just keeping the first one.
>> 
>> I ended up with that solution.
>> 
>> https://issues.apache.org/jira/browse/LUCENE-5013
> 
> Interesting problem… perhaps you could generalize your solution a bit… for 
> example, in, say, German, one could substitute 'ue' for 'ü', etc… so it looks 
> like what you are after is folding double vowels… irrespectively of how they 
> got there…
> 
> So… assuming something along the lines of Sean M. Burke Unidecode [1] for the 
> purpose of ASCII transliteration, what's left is simply to fold double 
> vowels, e.g.:

I pasted your reply as a comment in the JIRA-issue.

Hmmm interesting thought though. I have to consider if it make sense to make it 
this generic. I think it might be problematic for some languages though, 
especially Dutch.



                        karl


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to