On Wed, Dec 19, 2012 at 1:55 PM, <wxjmfa...@gmail.com> wrote: > Yes, it is correct (or can be considered as correct). > I do not wish to discuss the typographical problematic > of "Das Grosse Eszett". The web is full of pages on the > subject. However, I never succeeded to find an "official > position" from Unicode. The best information I found seem > to indicate (to converge), U+1E9E is now the "supported" > uppercase form of U+00DF. (see DIN).
Is this link not official? http://unicode.org/cldr/utility/character.jsp?a=00DF That defines a full uppercase mapping to SS and a simple uppercase mapping to U+00DF itself, not U+1E9E. My understanding of the simple mapping is that it is not allowed to map to multiple characters, whereas the full mapping is so allowed. > What is bothering me, is more the implementation. The Unicode > documentation says roughly this: if something can not be > honoured, there is no harm, but do not implement a workaroud. > In that case, I'm not sure Python is doing the best. But this behavior is per the specification, not a workaround. I think the worst thing we could do in this regard would be to start diverging from the specification because we think we know better than the Unicode Consortium. > If "wrong", this can be considered as programmatically correct > or logically acceptable (Py3.2) > >>>> 'Straße'.upper().lower().capitalize() == 'Straße' > True > > while this will *always* be problematic (Py3.3) > >>>> 'Straße'.upper().lower().capitalize() == 'Straße' > False On the other hand (Py3.2): >>> 'Straße'.upper().isupper() False vs. Py3.3: >>> 'Straße'.upper().isupper() True There is probably no one clearly correct way to handle the problem, but personally this contradiction bothers me more than the example that you posted. -- http://mail.python.org/mailman/listinfo/python-list