Re: Unicode literals and byte string interpretation.

Chris Angelico Thu, 27 Oct 2011 20:44:12 -0700

On Fri, Oct 28, 2011 at 2:05 PM, Fletcher Johnson <flt.john...@gmail.com> wrote:
> If I create a new Unicode object u'\x82\xb1\x82\xea\x82\xcd' how does
> this creation process interpret the bytes in the byte string? Does it
> assume the string represents a utf-16 encoding, at utf-8 encoding,
> etc...?
>
> For reference the string is これは in the 'shift-jis' encoding.


Encodings define how characters are represented in bytes. I think
probably what you're looking for is a byte string with those hex
values in it, which you can then turn into a Unicode string:

>>> a=b'\x82\xb1\x82\xea\x82\xcd'
>>> unicode(a,"shift-jis")    # use 'str' instead of 'unicode' in Python 3
u'\u3053\u308c\u306f'

The u'....' notation is for Unicode strings, which are not encoded in
any way. The last line of the above is a valid way of entering that
string in your source code, identifying Unicode characters by their
codepoints.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode literals and byte string interpretation.

Reply via email to