John Machin wrote: > On 9/06/2006 10:04 PM, KvS wrote: > > > 2) How do I get a representation of a unic. object in terms of Unicode > > code points? repr() doesn't do that, it sometimes parses or encodes the > > code points right: > > > >|>>> s=u"\u0040\u0166\u00e6" > >|>>> s > > u'@\u0166\xe6' > > |>>> ' '.join('U+%04X % ord(c) for c in s) > 'U+0040 U+0166 U+00E6' > > If you'd prefer it more Pythonic than unicode.orgic, adjust the format > string and separator to suit your taste. > > > (does this latter \xe6 have to do with the internal representation of > > unic. objects, maybe with this UCS-2 encoding?) > > |>>> u'\xe6' == u'\u00e6' == unichr(0xe6) > True > |>>> hex(ord(u'\u00e6')) > '0xe6' > > U+nnnnnn is represented internally as the integer 0xnnnnnn -- except if > it won't fit, but you can pretend that surrogate pairs don't exist, for > the moment :-) > > Cheers, > John
Thanks to you and Fredrik! What about q1? I know it's silly since for integers e.g. one doesn't give such an issue any thought at all, it's just that this understanding of en/decodings etc. make things a bit more blurry to me. It should be the case that a package may do internally (en-/decodign etc.) what it wants to represent/manipulate unic. strings but should always communicate to the outside world via the interchangable & uniform Python unicode object right? -- http://mail.python.org/mailman/listinfo/python-list