New submission from Merlijn van Deen: Steps to reproduce: ------------------- >>> eval("u'ä'") # in an utf-8 console, so this is equivalent to >>> eval("u'\xc3\xa4'")
Actual result: ---------------- u'\xc3\xa4' # i.e.: u'ä' Expected result: ----------------- SyntaxError: Non-ASCII character '\xc3' in file <string> on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details (which is what would happen if it was in a source file) Or, alternatively: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 2: ordinal not in range(128) (which is what results from decoding the str with sys.getdefaultencoding()) Instead, the string is interpreted as latin-1. The same happens for ast.literal_eval - even calling compile() directly. In python 3.2, this is the result, as utf-8 is used as default source encoding: >>> eval(b"'\xc3\xa4'") 'ä' Workarounds ---------- >>> eval("# encoding: utf-8\nu'\xc3\xa4'") u'\xe4' >>> eval("u'\xc3\xa4'".decode('utf-8')) u'\xe4' I understand this might be considered a WONTFIX, as it would change behavior some people might depend on. Nonetheless, documenting this explicitly seems a sensible thing to do. ---------- messages: 196398 nosy: valhallasw priority: normal severity: normal status: open title: eval() uses latin-1 to decode str versions: Python 2.7 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18870> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com