STINNER Victor added the comment: > The parser should check that the input is actually valid UTF-8 data.
Ah yes, correct. It looks like input data is still checked for valid UTF-8 data. I suppose that the byte strings should be decoded from UTF-8 because Python 3 manipulates Unicode strings, not byte strings. The patch only skips calls to translate_into_utf8(str, tok->encoding), calls to translate_into_utf8(str, tok->enc) are unchanged (notice: encoding != enc :-)). But it looks like translate_into_utf8(str, tok->enc) is not called if tok->enc is NULL. If tok->encoding is "utf-8" and tok->enc is NULL, maybe the input string is not decoded from UTF-8. But it sounds strange, because Python uses Unicode strings. Don't trust me, I would prefer an explanation of Benjamin who knows better than me the parser internals :-) ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19519> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com