[issue18961] Non-UTF8 encoding line

Serhiy Storchaka Mon, 16 Sep 2013 12:05:55 -0700

Serhiy Storchaka added the comment:

What about first line? Currently both Python interpreter and the tokenize 
module decode it from UTF-8 (actually due to bug #18960 Python interprets it 
twice, in different encodings). PEP 263 says:


    1. The complete Python source file should use a single encoding.
       Embedding of differently encoded data is not allowed and will
       result in a decoding error during compilation of the Python
       source code.

I conclude that the first line should be decoded with the encoding specified in 
the second line. We first should read the first line, check if it isn't a 
comment or contains encoding cookie, otherwise read the second line, determine 
the encoding, and decode read lines. Perhaps it will untangle issue18960 too.

----------
assignee:  -> serhiy.storchaka
stage:  -> needs patch

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue18961>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18961] Non-UTF8 encoding line

Reply via email to