[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

Marc-Andre Lemburg Thu, 11 Feb 2016 00:18:07 -0800

Marc-Andre Lemburg added the comment:

Serhiy: Removing the shortcut would slow down the tokenizer a lot since UTF-8 
encoded source code is the norm, not the exception.


The "problem" here is that the tokenizer trusts the source code in being in the 
correct encoding when you use one of utf-8 or iso-8859-1 and then skips the 
usual "decode into unicode, then encode to utf-8" step.

>From a purist point of view, you are right, Python should always pass through 
>those steps to detect encoding errors, but from a practical point of view, I 
>think the optimization is fine.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue25937>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25937] DIfference between utf8 and utf-8 when i define python source code encoding.

Reply via email to