2015-05-10 18:06 GMT+02:00 Chris Angelico <ros...@gmail.com>: > Whenever you start encoding and decoding, you need to know whether > you're working with bytes->text, text->bytes, or something else. In > the case of unicode-escape, it expects to encode text into bytes, as > you can see with your second example - you give it a Unicode string, > and get back a byte string. When you attempt to *decode* a Unicode > string, that doesn't actually make sense, so it first gets *encoded* > to bytes, before being decoded. What you're actually seeing there is > that the one-character string is being encoded into a three-byte UTF-8 > sequence,and then the unicode-escape decode takes those bytes and > interprets them as characters; as it happens, that's equivalent to a > Latin-1 decode:
Thanks for your response. I was using unicode-escape for handling escape characters like converting "\\n" to actual newlines. My input argument is already in string format and the decoding from bytes to string has already been done a couple of layers deeper, so I really needed a string to string conversion. I guess that it's not possible to do this operation without converting to bytes first (even if I use the codecs module, it will convert to bytes implicitly as you just told me). What I'm probably going to do is writing my own parser to perform this task. -- https://mail.python.org/mailman/listinfo/python-list