New submission from Jean-Paul Calderone <exar...@divmod.com>:

This issue may extend beyond just unicode.upper() and unicode.lower(), but it's 
very clear with these two methods, at least.

For example, consider DESERET SMALL LETTER EW.  On a UTF-16 build, calling 
upper on a string containing this doesn't change it to the capital variation 
(DESERET CAPITAL LETTER EW):

>>> u'\N{DESERET SMALL LETTER EW}'.upper() == u'\N{DESERET SMALL LETTER EW}'
True

It can also be seen that this isn't even recognized as lower case:

>>> u'\N{DESERET SMALL LETTER EW}'.islower()
False

With a UTF-32 build, however, the expected behavior (ie, the behavior one would 
get for a code point in the BMP with small and capital variations) is provided.

----------
components: Interpreter Core
messages: 97500
nosy: exarkun
severity: normal
status: open
title: UTF-16 build incorrectly translates cases for non-BMP code points
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue7663>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to