[issue4282] (Python3) The profile module deesn't understand the character set definition

STINNER Victor Sun, 09 Nov 2008 17:03:36 -0800

STINNER Victor <[EMAIL PROTECTED]> added the comment:

Exemple of the problem: exec('#header\n# encoding:
ISO-8859-1\nprint("h\xe9 h\xe9")\n')


exec(unicode) calls source_as_string() which converts unicode to bytes
using _PyUnicode_AsDefaultEncodedString() (UTF-8 charset). Then
PyRun_StringFlags() is called with the UTF-8 byte string with
PyCF_SOURCE_IS_UTF8 flag. But in the parser, get_coding_spec() recognize
the "#coding:" header and convert bytes to unicode using the specified
charset (which may be different than UTF-8).

The problem is in the function PyAST_FromNode(): the flag in not used in
the tokenizer but only in the AST parser. I also see:
    if (flags && flags->cf_flags & PyCF_SOURCE_IS_UTF8) {
        c.c_encoding = "utf-8";
        if (TYPE(n) == encoding_decl) {
#if 0
            ast_error(n, "encoding declaration in Unicode string");
            goto error;
#endif
            n = CHILD(n, 0);
        }
    } else if (TYPE(n) == encoding_decl) {
        c.c_encoding = STR(n);
        n = CHILD(n, 0);
    } else {
        /* PEP 3120 */
        c.c_encoding = "utf-8";
    }

The ast_error() may be uncommented.

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue4282>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue4282] (Python3) The profile module deesn't understand the character set definition

Reply via email to