Re: break unichr instead of fix ord?

Mark Tolonen Tue, 25 Aug 2009 20:55:34 -0700

<[email protected]> wrote in messagenews:2ad21a79-4a6c-42a7-8923-beb304bb5...@v20g2000yqm.googlegroups.com...

In Python 2.5 on Windows I could do [*1]:


 # Create a unicode character outside of the BMP.
 >>> a = u'\U00010040'

 # On Windows it is represented as a surogate pair.
 >>> len(a)
 2
 >>> a[0],a[1]
 (u'\ud800', u'\udc40')

 # Create the same character with the unichr() function.
 >>> a = unichr (65600)
 >>> a[0],a[1]
 (u'\ud800', u'\udc40')

 # Although the unichr() function works fine, its
 # inverse, ord(), doesn't.
 >>> ord (a)
 TypeError: ord() expected a character, but string of length 2 found

On Python 2.6, unichr() was "fixed" (using the word
loosely) so that it too now fails with characters outside
the BMP.

 >>> a = unichr (65600)
 ValueError: unichr() arg not in range(0x10000) (narrow Python build)

Why was this done rather than changing ord() to accept a
surrogate pair?

Does not this effectively make unichr() and ord() useless
on Windows for all but a subset of unicode characters?


Switch to Python 3?

x='\U00010040'
import unicodedata
unicodedata.name(x)

'LINEAR B SYLLABLE B025 A2'

ord(x)

hex(ord(x))

'0x10040'

unicodedata.name(chr(0x10040))

'LINEAR B SYLLABLE B025 A2'

ord(chr(0x10040))

print(ascii(chr(0x10040)))

'\ud800\udc40'

-Mark


--
http://mail.python.org/mailman/listinfo/python-list

Re: break unichr instead of fix ord?

Reply via email to