On Sun, Jun 1, 2014 at 5:58 PM, Marko Rauhamaa <ma...@pacujo.net> wrote: > As a Finnish-speaker, I hope that patch doesn't become default behavior. > Too many times, we have been victimized by the German conventions. A > Finnish-speaker would much rather see > > Järvenpää => Jarvenpaa > Öllölä => Ollola > Kärkkäinen => Karkkainen > > than > > Järvenpää => Jaervenpaeae > Öllölä => Oelloelae > Kärkkäinen => Kaerkkaeinen
It's even worse than that. The rules for ASCIIfying adorned characters vary according to context - Müller and Mueller are different names, and in many contexts should sort and compare differently, and I remember reading somewhere that there's a context in which it's more useful to decompose ü to u rather than ue. There is no "safe" lossy transformation that can be done to any language's words, and this is no exception. ASCIIfication has to be accepted as flawed; this issue (an inability to handle non-ASCII labels) is similar to a lot of blog URLs - http://rosuav.blogspot.com/2013/08/20th-international-g-festival-awards.html is talking about the "International G&S Festival" awards, but the URL drops the "&S" part. (If you absolutely have to transmit something losslessly in pure ASCII, you need a scheme like Punycode, which is a lot less clean and readable than a decomposition scheme.) Of course, the better solution is to permit the full Unicode alphabet in identifiers... ChrisA -- https://mail.python.org/mailman/listinfo/python-list