New submission from Jan Kaliszewski: It seems that the 'raw_unicode_escape' codec:
1) produces data that could be suitable for Python 2.x raw unicode string literals and not for Python 3.x raw unicode string literals (in Python 3.x \u... escapes are also treated literally); 2) seems to be buggy anyway: bytes in range 128-255 are encoded with the 'latin-1' encoding (in Python 3.x it is definitely a bug; and even in Python 2.x the feature is dubious, although at least the Py2's eval() and compile() functions officially accept 'latin-1'-encoded byte strings...). Python 3.3: >>> b = "zażółć".encode('raw_unicode_escape') >>> literal = b'r"' + b + b'"' >>> literal b'r"za\\u017c\xf3\\u0142\\u0107"' >>> eval(literal) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<string>", line 1 SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xf3 in position 8: invalid continuation byte >>> b'\xf3'.decode('latin-1') 'ó' >>> b = "zaż".encode('raw_unicode_escape') >>> literal = b'r"' + b + b'"' >>> literal b'r"za\\u017c"' >>> eval(literal) 'za\\u017c' >>> print(eval(literal)) za\u017c It believe that the 'raw_unicode_escape' codes should either be deprecated and later removed or be modified to accept only printable ascii characters. PS. Also, as a side note: neither 'raw_unicode_escape' nor 'unicode_escape' does escape quotes (see issue #7615) -- shouldn't it be at least documented explicitly? ---------- components: Library (Lib), Unicode messages: 202505 nosy: ezio.melotti, haypo, zuo priority: normal severity: normal status: open title: The 'raw_unicode_escape' codec buggy + not apropriate for Python 3.x versions: Python 3.4, Python 3.5 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19539> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com