[issue19519] Parser: don't transcode input string to UTF-8 if it is already encoded to UTF-8

STINNER Victor Thu, 07 Nov 2013 06:03:48 -0800

STINNER Victor added the comment:

> The parser should check that the input is actually valid UTF-8 data.


Ah yes, correct. It looks like input data is still checked for valid
UTF-8 data. I suppose that the byte strings should be decoded from
UTF-8 because Python 3 manipulates Unicode strings, not byte strings.

The patch only skips calls to translate_into_utf8(str, tok->encoding),
calls to translate_into_utf8(str, tok->enc) are unchanged (notice:
encoding != enc :-)).

But it looks like translate_into_utf8(str, tok->enc) is not called if
tok->enc is NULL.

If tok->encoding is "utf-8" and tok->enc is NULL, maybe the input
string is not decoded from UTF-8. But it sounds strange, because
Python uses Unicode strings.

Don't trust me, I would prefer an explanation of Benjamin who knows
better than me the parser internals :-)

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue19519>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue19519] Parser: don't transcode input string to UTF-8 if it is already encoded to UTF-8

Reply via email to