Thomas Wouters <tho...@python.org> added the comment: Py_CompileString() in Python 3.9 and later, using the PEG parser, appears to no longer honours source encoding cookies. A reduced test case:
#include "Python.h" #include <stdio.h> const char *src = ( "# -*- coding: Latin-1 -*-\n" "'''\xc3'''\n"); int main(int argc, char **argv) { Py_Initialize(); PyObject *res = Py_CompileString(src, "some_path", Py_file_input); if (res) { fprintf(stderr, "Compile succeeded.\n"); return 0; } else { fprintf(stderr, "Compile failed.\n"); PyErr_Print(); return 1; } } Compiling and running the resulting binary with Python 3.8 (or earlier): % ./encoding_bug Compile succeeded. With 3.9 and PYTHONOLDPARSER=1: % PYTHONOLDPARSER=1 ./encoding_bug Compile succeeded. With 3.9 (without the env var) or 3.10: % ./encoding_bug Compile failed. File "some_path", line 2 '''�''' ^ SyntaxError: (unicode error) 'utf-8' codec can't decode byte 0xc3 in position 0: unexpected end of data Writing the same bytes to a file and making python3.9 or python3.10 import them works fine, as does passing the bytes to compile(): Python 3.10.0+ (heads/3.10-dirty:7bac598819, Nov 16 2021, 20:35:12) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> b = open('encoding_bug.py', 'rb').read() >>> b b"# -*- coding: Latin-1 -*-\n'''\xc3'''\n" >>> import encoding_bug >>> encoding_bug.__doc__ 'Ã' >>> co = compile(b, 'some_path', 'exec') >>> co <code object <module> at 0x7f447e1b0c90, file "some_path", line 1> >>> co.co_consts[0] 'Ã' It's just Py_CompileString() that fails. I don't understand why, and I do believe it's a regression. ---------- nosy: +gregory.p.smith _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue45822> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com