On Fri, Oct 28, 2011 at 2:05 PM, Fletcher Johnson <flt.john...@gmail.com> wrote: > If I create a new Unicode object u'\x82\xb1\x82\xea\x82\xcd' how does > this creation process interpret the bytes in the byte string? Does it > assume the string represents a utf-16 encoding, at utf-8 encoding, > etc...? > > For reference the string is これは in the 'shift-jis' encoding.
Encodings define how characters are represented in bytes. I think probably what you're looking for is a byte string with those hex values in it, which you can then turn into a Unicode string: >>> a=b'\x82\xb1\x82\xea\x82\xcd' >>> unicode(a,"shift-jis") # use 'str' instead of 'unicode' in Python 3 u'\u3053\u308c\u306f' The u'....' notation is for Unicode strings, which are not encoded in any way. The last line of the above is a valid way of entering that string in your source code, identifying Unicode characters by their codepoints. ChrisA -- http://mail.python.org/mailman/listinfo/python-list