New submission from Jean-Paul Calderone <exar...@divmod.com>: This issue may extend beyond just unicode.upper() and unicode.lower(), but it's very clear with these two methods, at least.
For example, consider DESERET SMALL LETTER EW. On a UTF-16 build, calling upper on a string containing this doesn't change it to the capital variation (DESERET CAPITAL LETTER EW): >>> u'\N{DESERET SMALL LETTER EW}'.upper() == u'\N{DESERET SMALL LETTER EW}' True It can also be seen that this isn't even recognized as lower case: >>> u'\N{DESERET SMALL LETTER EW}'.islower() False With a UTF-32 build, however, the expected behavior (ie, the behavior one would get for a code point in the BMP with small and capital variations) is provided. ---------- components: Interpreter Core messages: 97500 nosy: exarkun severity: normal status: open title: UTF-16 build incorrectly translates cases for non-BMP code points versions: Python 2.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue7663> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com