New submission from STINNER Victor <victor.stin...@haypocalc.com>: It looks like the parser API (eg. PyParser_ParseFileFlagsEx, PyParser_ASTFromFile) expects utf-8 filename: err_input() decodes the filename from utf-8. But
Example in a non-ascii directory (/home/SHARE/SVN/py3kéŁ) and an ascii locale: ---- $ LANG= ./python -c "import inspect" Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/SHARE/SVN/py3k\xe9\u0141/Lib/inspect.py", line 1 SyntaxError: encoding problem: with BOM ---- The problem occurs in fp_setreadl(): this function reopens the file with the right encoding. But to open the file, the bytes filename is decoded from utf-8 (in strict mode), whereas the filename (in my example) contains surrogates and utf-8 in strict mode rejects surrogates. To support undecodable filenames in the parser API, we have two solutions: * Use the filesystem encoding with surrogateescape (PyUnicode_EncodeFSDefault, PyUnicode_DecodeFSDefault) * Use utf-8 in another mode: surrogateescape or surrogatepass The parser API has many public functions, and we have to consider the compatibility with Python 3.1. See also #9713 and #8611. ---------- components: Interpreter Core, Unicode messages: 118604 nosy: haypo priority: normal severity: normal status: open title: Support undecodable filenames in the parser API versions: Python 3.2 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10095> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com