----------------------------- On Sun, May 10, 2015 5:53 PM CEST Somelauw . wrote:
>In Python 3, decoding "€" with unicode-escape returns 'â\x82¬' which in my >opinion doesn't make sense. >The € already is decoded; if it were encoded it would look like this: >'\u20ac'. >So why is it doing this? > >In Python 2 the behaviour is similar, but slightly different. > >$ python3 -S >Python 3.3.3 (default, Nov 27 2013, 17:12:35) >[GCC 4.8.2] on linux >>> import codecs >>> codecs.decode('€', 'unicode-escape') >'â\x82¬' >>> codecs.encode('€', 'unicode-escape') >b'\\u20ac' >>> > >$ python2 -S >Python 2.7.5+ (default, Sep 17 2013, 15:31:50) >[GCC 4.8.1] on linux2 >>> import codecs >>> codecs.decode('€', 'unicode-escape') >u'\xe2\x82\xac' >>> codecs.encode('€', 'unicode-escape') >Traceback (most recent call last): > File "<stdin>", line 1, in <module> >UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: >ordinal not in range(128) >>> Hi, I only have Python 2 on my phone, but I am suprised that you (and are able to) decode unicode strings. What result do you get when you do the following in Python 3: Python 2.7.2 (default, Oct 25 2014, 20:52:15) [GCC 4.9 20140827 (prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import codecs >>> codecs.decode(b'€', 'unicode-escape') u'\xe2\x82\xac' >>> codecs.encode(u'€', 'unicode-escape') '\\xe2\\x82\\xac' >>> -- https://mail.python.org/mailman/listinfo/python-list