On 9/06/2006 10:04 PM, KvS wrote: > 2) How do I get a representation of a unic. object in terms of Unicode > code points? repr() doesn't do that, it sometimes parses or encodes the > code points right: > >|>>> s=u"\u0040\u0166\u00e6" >|>>> s > u'@\u0166\xe6'
|>>> ' '.join('U+%04X % ord(c) for c in s) 'U+0040 U+0166 U+00E6' If you'd prefer it more Pythonic than unicode.orgic, adjust the format string and separator to suit your taste. > (does this latter \xe6 have to do with the internal representation of > unic. objects, maybe with this UCS-2 encoding?) |>>> u'\xe6' == u'\u00e6' == unichr(0xe6) True |>>> hex(ord(u'\u00e6')) '0xe6' U+nnnnnn is represented internally as the integer 0xnnnnnn -- except if it won't fit, but you can pretend that surrogate pairs don't exist, for the moment :-) Cheers, John -- http://mail.python.org/mailman/listinfo/python-list