Ezio Melotti <ezio.melo...@gmail.com> added the comment: FWIW, on Python3 it seems to work: >>> import unicodedata >>> unicodedata.category("\U00010000") 'Lo' >>> unicodedata.category("\U00011000") 'Cn' >>> unicodedata.category(chr(0x10000)) 'Lo' >>> unicodedata.category(chr(0x11000)) 'Cn' >>> ord(chr(0x10000)), 0x10000 (65536, 65536) >>> ord(chr(0x11000)), 0x11000 (69632, 69632)
I'm using a narrow build too: >>> import sys >>> sys.maxunicode 65535 >>> len('\U00010000') 2 >>> ord('\U00010000') 65536 On Python2 unichr() is supposed to raise a ValueError on a narrow build if the value is greater than 0xFFFF [1], but if the characters above 0xFFFF can be represented with u"\Uxxxxxxxx" there should be a way to fix unichr so it can return them. Python3 already does it with chr(). Maybe we should open a new issue for this if it's not present already. [1]: http://docs.python.org/library/functions.html#unichr _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5127> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com