New submission from STINNER Victor <victor.stin...@haypocalc.com>:

It looks like the parser API (eg. PyParser_ParseFileFlagsEx, 
PyParser_ASTFromFile) expects utf-8 filename: err_input() decodes the filename 
from utf-8. But 

Example in a non-ascii directory (/home/SHARE/SVN/py3kéŁ) and an ascii locale:
----
$ LANG= ./python -c "import inspect"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/SHARE/SVN/py3k\xe9\u0141/Lib/inspect.py", line 1
SyntaxError: encoding problem: with BOM
----

The problem occurs in fp_setreadl(): this function reopens the file with the 
right encoding. But to open the file, the bytes filename is decoded from utf-8 
(in strict mode), whereas the filename (in my example) contains surrogates and 
utf-8 in strict mode rejects surrogates.

To support undecodable filenames in the parser API, we have two solutions:

 * Use the filesystem encoding with surrogateescape (PyUnicode_EncodeFSDefault, 
PyUnicode_DecodeFSDefault)
 * Use utf-8 in another mode: surrogateescape or surrogatepass

The parser API has many public functions, and we have to consider the 
compatibility with Python 3.1.

See also #9713 and #8611.

----------
components: Interpreter Core, Unicode
messages: 118604
nosy: haypo
priority: normal
severity: normal
status: open
title: Support undecodable filenames in the parser API
versions: Python 3.2

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue10095>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to