Re: Unicode codepoints

jmfauth Wed, 22 Jun 2011 03:16:24 -0700

That seems to me correct.

>>> '\\u{:04x}'.format(ord(u'é'))
\u00e9
>>> '\\U{:08x}'.format(ord(u'é'))
\U000000e9
>>>


because

>>> u'\U00e9'
  File "<eta last command>", line 1
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes
in position 0-5: end of string in escape sequence
>>> u'\U000000e9'
é
>>> u'\u00e9'
é
>>>

from this:

>>> u'éléphant\N{EURO SIGN}'
éléphant
>>> u = u'éléphant\N{EURO SIGN}'
>>> ''.join(['\\u{:04x}'.format(ord(c)) for c in u])
\u00e9\u006c\u00e9\u0070\u0068\u0061\u006e\u0074\u20ac
>>>

Skipping surrogate pairs is a little bit a non sense,
because the purpose is to display code points!
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode codepoints

Reply via email to