[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-16 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: Applied in r66951. -- status: open -> closed ___ Python tracker <[EMAIL PROTECTED]> ___

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-16 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: Sorry about that; been one of those days. Doing a svn up and making sure it still compiles fine. ___ Python tracker <[EMAIL PROTECTED]>

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-16 Thread Barry A. Warsaw
Barry A. Warsaw <[EMAIL PROTECTED]> added the comment: Brett, please apply and close the issue. -- nosy: +barry ___ Python tracker <[EMAIL PROTECTED]> ___ _

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-15 Thread Martin v. Löwis
Martin v. Löwis <[EMAIL PROTECTED]> added the comment: > I added a test for compile() in there, which is why the patch is > claiming that. There is an uploaded version of test_pep3120.py on the > issue. Ah, ok. I missed that - that change is also fine. ___ Py

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-15 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: On Tue, Oct 14, 2008 at 11:05 PM, Martin v. Löwis <[EMAIL PROTECTED]> wrote: > > Martin v. Löwis <[EMAIL PROTECTED]> added the comment: > > The patch looks fine to me, please apply. > Great! > I notice that the diff file reports changes to tes

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-14 Thread Martin v. Löwis
Martin v. Löwis <[EMAIL PROTECTED]> added the comment: The patch looks fine to me, please apply. I notice that the diff file reports changes to test_pep3120.py. No such changes should be necessary, so please exclude them from committing. -- assignee: loewis -> brett.cannon keywords: -n

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-10 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: Yep, it passes for me now. Martin, have any objection to this patch? -- assignee: -> loewis ___ Python tracker <[EMAIL PROTECTED]> ___

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-10 Thread STINNER Victor
STINNER Victor <[EMAIL PROTECTED]> added the comment: Amaury applied my both patches for issues #2384 and #3975. So all tests now pass with python trunk + alt_latin_1.diff. ___ Python tracker <[EMAIL PROTECTED]>

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-07 Thread STINNER Victor
STINNER Victor <[EMAIL PROTECTED]> added the comment: test_sys failure is fixed by the issue #2384. -- dependencies: +[Py3k] line number is wrong after encoding declaration ___ Python tracker <[EMAIL PROTECTED]>

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-06 Thread Brett Cannon
Changes by Brett Cannon <[EMAIL PROTECTED]>: Added file: http://bugs.python.org/file11722/alt_latin_1.diff ___ Python tracker <[EMAIL PROTECTED]> ___ __

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-06 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: So I tried the patch (I attached my version with different comments in the header file and removed some redundant change in whitespacing), and test_sys consistently fails for me: test_current_frames (__main__.SysModuleTest) AssertionError: ''

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-06 Thread STINNER Victor
Changes by STINNER Victor <[EMAIL PROTECTED]>: Added file: http://bugs.python.org/file11716/tokenizer_iso-8859-1-patch3.patch ___ Python tracker <[EMAIL PROTECTED]> ___ ___

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-06 Thread STINNER Victor
Changes by STINNER Victor <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11715/python3_bytes_filename-3.patch ___ Python tracker <[EMAIL PROTECTED]> ___

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-06 Thread STINNER Victor
STINNER Victor <[EMAIL PROTECTED]> added the comment: My patch version 2 included an "unrelated" fix for the issue2384. Added file: http://bugs.python.org/file11715/python3_bytes_filename-3.patch ___ Python tracker <[EMAIL PROTECTED]>

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-06 Thread STINNER Victor
Changes by STINNER Victor <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11698/tokenizer_iso-8859-1.patch ___ Python tracker <[EMAIL PROTECTED]> ___

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-03 Thread STINNER Victor
STINNER Victor <[EMAIL PROTECTED]> added the comment: After reading tokenizer.c 1000 times, I finally used grep: $ grep -l -i 'iso.8859.1' $(find -name "*.c") ./Python/ast.c <~~~ WTF? ./Objects/unicodeobject.c ./Parser/tokenizer.c ./Modules/cjkcodecs/_codecs_iso2022.c ./Modules/expat/xmltok.c _

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-03 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: Thanks for finding that, Victor! I will do a patch review when I have a chance (it won't be until after the weekend). -- assignee: -> brett.cannon ___ Python tracker <[EMAIL PROTECTED]>

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-03 Thread STINNER Victor
STINNER Victor <[EMAIL PROTECTED]> added the comment: @brett.cannon: I found it: ast.c used a hack for iso-8859-1! Since this hack introduces a bug (your compile(...) example), I prefer to remove it to simplify to code. The new patch just removes the hack in tokenizer.c and ast.c. It does also

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-03 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: Sorry, I mis-spoke: your patch, Victor, doesn't change the state to NORMAL. But my worry still stands; why does iso-8859-1 need to be special-cased? It suggests to me that some more fundamental needs to be dealt with instead of just patching aro

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-03 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: But why does iso-8859-1 need to be treated as a special case? UTF-8 is special because it is the default encoding for source. But iso-8859-1 really shouldn't be special, IMO. Your patch does exactly what happens lower when set_readline() succee

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-03 Thread STINNER Victor
STINNER Victor <[EMAIL PROTECTED]> added the comment: It looks like the problem of fix_latin.diff is the decoding_state: it's set to STATE_NORMAL whereas current behaviour is to stay in state STATE_RAW. I wrote another patch which is a mix of case 1 (utf-8: just set tok->encoding) and case 2

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-03 Thread STINNER Victor
STINNER Victor <[EMAIL PROTECTED]> added the comment: See also Lib/test/test_shlex.py: trunk is ok, but with fix_latin.diff the test fails. ___ Python tracker <[EMAIL PROTECTED]> ___ _

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-03 Thread STINNER Victor
STINNER Victor <[EMAIL PROTECTED]> added the comment: Using py3k trunk + fix_latin.diff: - compile(b'# coding: latin-1\nu = "\xC7"\n', '', 'exec') doesn't fail - test_pep3120.py is ok - but execute a ISO-8859-1 script fails: see attached iso.py Original Python3: $ python iso.py 'Bonjour ma c

[issue3574] compile() cannot decode Latin-1 source encodings

2008-10-02 Thread Barry A. Warsaw
Changes by Barry A. Warsaw <[EMAIL PROTECTED]>: -- priority: deferred blocker -> release blocker ___ Python tracker <[EMAIL PROTECTED]> ___

[issue3574] compile() cannot decode Latin-1 source encodings

2008-09-26 Thread Barry A. Warsaw
Changes by Barry A. Warsaw <[EMAIL PROTECTED]>: -- priority: release blocker -> deferred blocker ___ Python tracker <[EMAIL PROTECTED]> ___

[issue3574] compile() cannot decode Latin-1 source encodings

2008-09-17 Thread Barry A. Warsaw
Changes by Barry A. Warsaw <[EMAIL PROTECTED]>: -- priority: deferred blocker -> release blocker ___ Python tracker <[EMAIL PROTECTED]> ___

[issue3574] compile() cannot decode Latin-1 source encodings

2008-09-09 Thread Barry A. Warsaw
Changes by Barry A. Warsaw <[EMAIL PROTECTED]>: -- priority: release blocker -> deferred blocker ___ Python tracker <[EMAIL PROTECTED]> ___

[issue3574] compile() cannot decode Latin-1 source encodings

2008-09-05 Thread Brett Cannon
Changes by Brett Cannon <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11131/pep3120_test.diff ___ Python tracker <[EMAIL PROTECTED]> ___ ___

[issue3574] compile() cannot decode Latin-1 source encodings

2008-09-05 Thread Brett Cannon
Changes by Brett Cannon <[EMAIL PROTECTED]>: Added file: http://bugs.python.org/file11399/test_pep3120.py ___ Python tracker <[EMAIL PROTECTED]> ___ ___

[issue3574] compile() cannot decode Latin-1 source encodings

2008-09-05 Thread Brett Cannon
Changes by Brett Cannon <[EMAIL PROTECTED]>: Removed file: http://bugs.python.org/file11130/fix_latin.diff ___ Python tracker <[EMAIL PROTECTED]> ___ __

[issue3574] compile() cannot decode Latin-1 source encodings

2008-09-05 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: I have attached a new version of the patch with the changes to test_imp removed as issue 3594 fixed the need for the change. I have also directly uploaded test_pep3120.py since it is flagged as binary and thus cannot be diffed by svn. Added fil

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-24 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: The test_imp stuff has to do with PyTokenizer_FindEncoding(). imp.find_module() only opens the file, passes the file descriptor to PyTokenizer_FindEncoding() and then returns a file object with the found encoding. Problem is that (as issue 3594

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-24 Thread Martin v. Löwis
Martin v. Löwis <[EMAIL PROTECTED]> added the comment: > As for treating Latin-1 as a raw encoding, how can that be theoretically > okay if the parser assumes UTF-8 and Latin-1 is not a superset of Latin-1? The parser doesn't assume UTF-8, but "ascii+", i.e. it passes all non-ASCII bytes on to t

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-24 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: Actually, the tests don't have to change; if issue 3594 gets applied then that change cascades into this issue and negates the need to change the tests themselves. As for treating Latin-1 as a raw encoding, how can that be theoretically okay if

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-24 Thread Martin v. Löwis
Martin v. Löwis <[EMAIL PROTECTED]> added the comment: Since this is marked "release blocker", I'll provide a shallow comment: I don't think it should be a release blocker. It's a bug in the compile function, and there are various work-arounds (such as saving the bytes to a temporary file and ex

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-21 Thread Brett Cannon
Changes by Brett Cannon <[EMAIL PROTECTED]>: -- keywords: +needs review ___ Python tracker <[EMAIL PROTECTED]> ___ ___ Python-bugs-list

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-21 Thread Brett Cannon
Changes by Brett Cannon <[EMAIL PROTECTED]>: -- priority: critical -> release blocker ___ Python tracker <[EMAIL PROTECTED]> ___ ___ Pyt

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-18 Thread Benjamin Peterson
Benjamin Peterson <[EMAIL PROTECTED]> added the comment: That line dates back to the PEP 263 implementation. Martin? -- nosy: +benjamin.peterson, loewis ___ Python tracker <[EMAIL PROTECTED]>

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-18 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: There is a potential dependency on issue3594 as it would change how imp.find_module() acts and thus make test_imp no longer fail in the way it has. -- dependencies: +PyTokenizer_FindEncoding() never succeeds ___

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-18 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: Can someone double-check this patch for me? I don't have much experience with the parser so I want to make sure I am not doing anything wrong. ___ Python tracker <[EMAIL PROTECTED]>

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-16 Thread Brett Cannon
Changes by Brett Cannon <[EMAIL PROTECTED]>: -- type: -> behavior ___ Python tracker <[EMAIL PROTECTED]> ___ ___ Python-bugs-list maili

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-16 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: Attached is a test for test_pep3120 (since that is what most likely introduced the breakage). It's a separate patch since the source file is marked as binary and thus can't be diffed by ``svn diff``. -- components: +Interpreter Core pri

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-16 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: Here is a potential fix. It broke test_imp because it assumed that Latin-1 source files would be encoded at Latin-1 instead of UTF-8 when returned by imp.new_module(). Doesn't seem like a critical change as the file is still properly decoded. -

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-16 Thread Brett Cannon
Brett Cannon <[EMAIL PROTECTED]> added the comment: Looks like Parser/tokenizer.c:check_coding_spec() considered Latin-1 a raw encoding just like UTF-8. Patch is in the works. ___ Python tracker <[EMAIL PROTECTED]> __

[issue3574] compile() cannot decode Latin-1 source encodings

2008-08-16 Thread Brett Cannon
New submission from Brett Cannon <[EMAIL PROTECTED]>: The following leads to a SyntaxError in 3.0: compile(b'# coding: latin-1\nu = "\xC7"\n', '', 'exec') That is not the case in Python 2.6. -- messages: 71251 nosy: brett.cannon severity: normal status: open title: compile() cannot d