STINNER Victor <victor.stin...@haypocalc.com> added the comment:

The problem is not specific to Py_CompileString(): all functions based 
(indirectly) on PyParser_ASTFromString() and PyParser_ASTFromFile() expect 
filenames encoded in utf-8 with the strict error handler.

If we choose to use something else than utf-8 in strict mode, here is an 
incomplete list of functions that have to be patched:
 - parser:
   * initerr()
   * err_input()
 - ast
   * ast_error_finish()

And the list of impacted functions (parsing functions accepting filenames):
 - PyParser_ParseStringFlagsFilename()
 - PyParser_ParseFile*()
 - PyParser_ASTFromString(), PyParser_ASTFromFile()
 - PyAST_FromNode()
 - PyRun_SimpleFile*()
 - PyRun_AnyFile*()
 - PyRun_InteractiveOneFlags()
 - etc.

All these functions are public and I don't think that it would be a good idea 
to change the encoding (eg. to iso-8859-1). We can use a different error 
handler (especially surrogateespace, as suggested in the initial message) 
and/or create new functions accepting unicode filenames.

--

I'm working on undecodable filenames in issues #8611 and #9425, especially on 
the import machinery part. When the import machinery will be fully unicode 
compliant, the last part will be the "parser machinery" (Parser/*.c). It is a 
little bit more complex to patch the parser because there is the bootstrap 
problem: the parser is compiled twice, once with a small subset of the C Python 
API (using some mockups), once with the full API.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9713>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to