STINNER Victor <victor.stin...@haypocalc.com> added the comment: Support of characters outside the Unicode BMP (code > 0xffff) is not complete in narrow build (sizeof(Py_UNICODE) == 2) for Python2:
$ ./python Python 2.7b2+ (trunk:81139M, May 13 2010, 18:45:37) >>> x=u'\U00010000' >>> x[0], x[1] (u'\ud800', u'\udc00') >>> len(x) 2 >>> ord(x) Traceback (most recent call last): ... TypeError: ord() expected a character, but string of length 2 found >>> unichr(0x10000) Traceback (most recent call last): ... ValueError: unichr() arg not in range(0x10000) (narrow Python build) It looks better in Python3: $ ./python Python 3.2a0 (py3k:81137:81138, May 13 2010, 18:50:51) >>> x='\U00010000' >>> x[0], x[1] ('\ud800', '\udc00') >>> len(x) 2 >>> ord(x) 65536 >>> chr(0x10000) '\U00010000' About the issue, the problem is in function u_set(). This function should use PyUnicode_AsWideChar() but PyUnicode_AsWideChar() doesn't support surrogates... whereas PyUnicode_FromWideChar() does support surrogates. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue8670> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com