Re: Short questions wrt Python & Unicode

John Machin Fri, 09 Jun 2006 06:05:50 -0700

On 9/06/2006 10:04 PM, KvS wrote:

> 2) How do I get a representation of a unic. object in terms of Unicode
> code points? repr() doesn't do that, it sometimes parses or encodes the
> code points right:
> 
>|>>> s=u"\u0040\u0166\u00e6"
>|>>> s
> u'@\u0166\xe6'


|>>> ' '.join('U+%04X % ord(c) for c in s)
'U+0040 U+0166 U+00E6'

If you'd prefer it more Pythonic than unicode.orgic, adjust the format 
string and separator to suit your taste.

> (does this latter \xe6 have to do with the internal representation of
> unic. objects, maybe with this  UCS-2 encoding?)

|>>> u'\xe6' == u'\u00e6' == unichr(0xe6)
True
|>>> hex(ord(u'\u00e6'))
'0xe6'

U+nnnnnn is represented internally as the integer 0xnnnnnn -- except if 
it won't fit, but you can pretend that surrogate pairs don't exist, for 
the moment :-)

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Short questions wrt Python & Unicode

Reply via email to