Paul Boddie wrote: > what I'd like to see, for a change, is some kind > of analysis of the prior art in connection with this matter. Java has > had extensive UTF-8 support all over the place for ages, but either no- > one here has any direct experience with the consequences of this > support, or they are more interested in arguing about it as if it were > a hypothetical situation when it is, in fact, a real-life situation > that can presumably be observed and measured.
It's difficult to extract this analysis from Java. Most people I know from the Java world do not use this feature as it is error prone. Java does not have support for *explicit* source encodings, i.e. the local environment settings win. This is bound to fail e.g. on a latin-1 system where I would like to work with UTF-8 files (which tend to work better on the Unix build server, etc.) In the Python world, these problems are solved now and will disappear when UTF-8 becomes the default encoding (note that this does not inverse the problem as people using non-utf8 encodings will then just set the respective encoding tag in their files). So there is not much Python can learn from Java here except for what it already does better. I am actually working on a couple of Java projects that use German identifiers, transliterated to prevent the encoding problems inherent to Java. The transliteration makes things harder to read than necessary - and this is only German-vs-English, i.e. simple things like 'ae' instead of 'ä' and 'ss' instead of 'ß'. But sometimes things become hard to read that way or look like different words. And it leads to all sorts of weirdly mixed names as sometimes it is easier to write the similar looking (although maybe not completely synonymous) English word instead of the transliterated German one. So, yes, in a way, the code quality in these projects suffers from developers not being able to freely write Unicode identifiers. Stefan -- http://mail.python.org/mailman/listinfo/python-list