On Thu, Dec 20, 2012 at 8:23 AM, Ian Kelly <ian.g.ke...@gmail.com> wrote: > On Wed, Dec 19, 2012 at 1:55 PM, <wxjmfa...@gmail.com> wrote: >> Yes, it is correct (or can be considered as correct). >> I do not wish to discuss the typographical problematic >> of "Das Grosse Eszett". The web is full of pages on the >> subject. However, I never succeeded to find an "official >> position" from Unicode. The best information I found seem >> to indicate (to converge), U+1E9E is now the "supported" >> uppercase form of U+00DF. (see DIN). > > Is this link not official? > > http://unicode.org/cldr/utility/character.jsp?a=00DF > > That defines a full uppercase mapping to SS and a simple uppercase > mapping to U+00DF itself, not U+1E9E. My understanding of the simple > mapping is that it is not allowed to map to multiple characters, > whereas the full mapping is so allowed.
Ahh, thanks, that explains why the other Unicode-aware language I tried behaved differently. Pike v7.9 release 5 running Hilfe v3.5 (Incremental Pike Frontend) > string s="Stra\u00dfe"; > upper_case(s); (1) Result: "STRA\337E" > lower_case(upper_case(s)); (2) Result: "stra\337e" > String.capitalize(lower_case(s)); (3) Result: "Stra\337e" The output is the equivalent of repr(), and it uses octal escapes where possible (for brevity), so \337 is its representation of U+00DF (decimal 223, octal 337). Upper-casing and lower-casing this character result in the same thing. > write("Original: %s\nLower: %s\nUpper: %s\n",s,lower_case(s),upper_case(s)); Original: Straße Lower: straße Upper: STRAßE It's worth noting, incidentally, that the unusual upper-case form of the letter (U+1E9E) does lower-case to U+00DF in both Python 3.3 and Pike 7.9.5: > lower_case("Stra\u1E9Ee"); (9) Result: "stra\337e" >>> ord("\u1e9e".lower()) 223 So both of them are behaving in a compliant manner, even though they're not quite identical. ChrisA -- http://mail.python.org/mailman/listinfo/python-list