In Python 2.5 on Windows I could do [*1]: # Create a unicode character outside of the BMP. >>> a = u'\U00010040'
# On Windows it is represented as a surogate pair. >>> len(a) 2 >>> a[0],a[1] (u'\ud800', u'\udc40') # Create the same character with the unichr() function. >>> a = unichr (65600) >>> a[0],a[1] (u'\ud800', u'\udc40') # Although the unichr() function works fine, its # inverse, ord(), doesn't. >>> ord (a) TypeError: ord() expected a character, but string of length 2 found On Python 2.6, unichr() was "fixed" (using the word loosely) so that it too now fails with characters outside the BMP. >>> a = unichr (65600) ValueError: unichr() arg not in range(0x10000) (narrow Python build) Why was this done rather than changing ord() to accept a surrogate pair? Does not this effectively make unichr() and ord() useless on Windows for all but a subset of unicode characters? -- http://mail.python.org/mailman/listinfo/python-list