Marc-Andre Lemburg added the comment:

Serhiy: Removing the shortcut would slow down the tokenizer a lot since UTF-8 
encoded source code is the norm, not the exception.

The "problem" here is that the tokenizer trusts the source code in being in the 
correct encoding when you use one of utf-8 or iso-8859-1 and then skips the 
usual "decode into unicode, then encode to utf-8" step.

>From a purist point of view, you are right, Python should always pass through 
>those steps to detect encoding errors, but from a practical point of view, I 
>think the optimization is fine.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25937>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to