Bugs item #1115379, was opened at 2005-02-03 14:11 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1115379&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Parser/Compiler Group: Python 2.4 Status: Open Resolution: None Priority: 7 Submitted By: Christoph Zwerschke (cito) Assigned to: Martin v. Löwis (loewis) Summary: Built-in compile function with PEP 0263 encoding bug Initial Comment: a = 'print "Hello, World"' u = '# -*- coding: utf-8 -*-\n' + a print compile(a, '<string>', 'exec') # ok print compile(u, '<string>', 'exec') # ok print compile(unicode(a), '<string>', 'exec') # ok print compile(unicode(u), '<string>', 'exec') # error # The last line gives a SystemError. # Think this is a bug. ---------------------------------------------------------------------- >Comment By: Martin v. Löwis (loewis) Date: 2006-03-22 14:56 Message: Logged In: YES user_id=21627 I've committed this patch (along with a test case) as 43227 into the 2.4 branch; the trunk still needs fixing. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-03-20 22:14 Message: Logged In: YES user_id=33168 Actually, I don't much care about the answer as long as it isn't a core dump/assert or a SystemError. I'm fine with a syntax error. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2006-03-20 10:03 Message: Logged In: YES user_id=21627 I still wonder why anybody would want to do that, so I don't see it as a big problem that it gives an error in 2.4: it *should* give an error, although not the one it currently gives. It seems that wigy would expect that the encoding declaration is ignored, whereas you (nnorwitz) are suggesting that the UTF-8 default should be ignored. In the face of ambiguity, refuse the temptation to guess. So I still think it should give a SyntaxError instead. I'll attach an alternative patch. ---------------------------------------------------------------------- Comment By: Neal Norwitz (nnorwitz) Date: 2006-03-20 09:28 Message: Logged In: YES user_id=33168 Martin, the attached patches (2.4 and 2.5) fix the problem. However, it seems that the patches would violate the PEP according to one of your notes. I'm not sure about all the details, but ISTM based on your comment that if (flags && flags->cf_flags & PyCF_SOURCE_IS_UTF8) and (TYPE(n) == encoding_decl) this is an error that should be returned? I would like to get this fixed for 2.4.3, so we need to move fast for it. 2.5 can wait and is trivial to fix once we know what this is supposed to do. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-02-20 22:37 Message: Logged In: YES user_id=849994 This even aborts the interpreter in 2.5 HEAD with a failing assertion. ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2005-09-28 07:48 Message: Logged In: YES user_id=21627 If you load the files manually, why is it that you want to decode them to Unicode before compile()ing them? Couldn't you just pass the bytes you read from the file to compile()? ---------------------------------------------------------------------- Comment By: Vágvölgyi Attila (wigy) Date: 2005-09-28 06:29 Message: Logged In: YES user_id=156682 If this special case is a feature, not a bug, than it breaks some symmetry for sure. If I run a script having utf-8 encoding from a file with python script.py then it has to have an encoding declaration. Now if I would like to load the same file manually, decode it to a unicode object, I also have to remove the encoding declaration at the beginning of the file before I can give it to the compile() function. What special advantage comes from the fact that the compiler does not simply ignore encoding declaration nodes from unicode objects? Does this error message catch some possible errors or does it make the compiler code simpler? ---------------------------------------------------------------------- Comment By: Vágvölgyi Attila (wigy) Date: 2005-09-28 06:20 Message: Logged In: YES user_id=156682 If this special case is a feature, not a bug, than it breaks some symmetry for sure. If I run a script having utf-8 encoding from a file with python script.py then it has to have an encoding declaration. Now if I would like to load the same file manually, decode it to a unicode object, I also have to remove the encoding declaration at the beginning of the file before I can give it to the compile() function. What special advantage comes from the fact that the compiler does not simply ignore encoding declaration nodes from unicode objects? Does this error message catch some possible errors or does it make the compiler code simpler? ---------------------------------------------------------------------- Comment By: Martin v. Löwis (loewis) Date: 2005-02-10 01:37 Message: Logged In: YES user_id=21627 There is a bug somewhere, certainly. However, I believe it is in PEP 263, which should point out that unicode strings in compile are only legal if they do *not* contain an encoding declaration, as such strings are implicitly encoded as UTF-8. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1115379&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com