Marc-Andre Lemburg added the comment: Serhiy: Removing the shortcut would slow down the tokenizer a lot since UTF-8 encoded source code is the norm, not the exception.
The "problem" here is that the tokenizer trusts the source code in being in the correct encoding when you use one of utf-8 or iso-8859-1 and then skips the usual "decode into unicode, then encode to utf-8" step. >From a purist point of view, you are right, Python should always pass through >those steps to detect encoding errors, but from a practical point of view, I >think the optimization is fine. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue25937> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com