break unichr instead of fix ord?

rurpy Tue, 25 Aug 2009 12:52:28 -0700

In Python 2.5 on Windows I could do [*1]:

  # Create a unicode character outside of the BMP.
  >>> a = u'\U00010040'


  # On Windows it is represented as a surogate pair.
  >>> len(a)
  2
  >>> a[0],a[1]
  (u'\ud800', u'\udc40')

  # Create the same character with the unichr() function.
  >>> a = unichr (65600)
  >>> a[0],a[1]
  (u'\ud800', u'\udc40')

  # Although the unichr() function works fine, its
  # inverse, ord(), doesn't.
  >>> ord (a)
  TypeError: ord() expected a character, but string of length 2 found

On Python 2.6, unichr() was "fixed" (using the word
loosely) so that it too now fails with characters outside
the BMP.

  >>> a = unichr (65600)
  ValueError: unichr() arg not in range(0x10000) (narrow Python build)

Why was this done rather than changing ord() to accept a
surrogate pair?

Does not this effectively make unichr() and ord() useless
on Windows for all but a subset of unicode characters?
-- 
http://mail.python.org/mailman/listinfo/python-list

break unichr instead of fix ord?

Reply via email to