[ python-Bugs-1251300 ] Decoding with unicode_internal segfaults on UCS-4 builds
Bugs item #1251300, was opened at 2005-08-03 21:49 Message generated for change (Comment added) made by nhaldimann You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1251300&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Unicode Group: Python 2.5 Status: Open Resolution: None Priority: 5 Submitted By: Nik Haldimann (nhaldimann) Assigned to: M.-A. Lemburg (lemburg) Summary: Decoding with unicode_internal segfaults on UCS-4 builds Initial Comment: On UCS-4 builds, decoding a byte string with the unicode_internal codec doesn't correctly work for code points from 0x8000 upwards and even segfaults. I have observed the same behaviour on 2.5 from CVS and 2.4.0 on OS X/PowerPC as well as on 2.3.5 on Linux/x86. Here's an example: Python 2.5a0 (#1, Aug 3 2005, 21:34:05) [GCC 3.3 20030304 (Apple Computer, Inc. build 1671)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> "\x7f\xff\xff\xff".decode("unicode_internal") u'\U7fff' >>> "\x80\x00\x00\x00".decode("unicode_internal") u'\x00' >>> "\x80\x00\x00\x01".decode("unicode_internal") u'\x01' >>> "\x81\x00\x00\x00".decode("unicode_internal") Segmentation fault On little endian architectures the byte strings must be reversed for the same effect. I'm not sure if I understand what's going on, but I guess there are 2 solution strategies: 1. Make unicode_internal work for any code point up to 0x. 2. Make unicode_internal raise a UnicodeDecodeError for anything above 0x10 (== sys.maxunicode for UCS-4 builds). It seems like there are no unicode code points above 0x10, so the latter solution feels more correct to me, even though it might break backwards compatibility a tiny bit. The unicodeescape codec already does a similar thing: >>> u"\U0011" UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-9: illegal Unicode character -- >Comment By: Nik Haldimann (nhaldimann) Date: 2005-08-19 16:17 Message: Logged In: YES user_id=1317086 I agree about the ifdefs. I'm not sure about how to handle input strings of incorrect length. I guess raising an UnicodeDecodeError is in order. But I think it doesn't make sense to let it pass through the error handler, since the data the handler would see is potentially nonsensical (e.g., the code point value). Can you comment on this? Is it ok to raise a UnicodeDecodeError and skip the error handler here? -- Comment By: Walter Dörwald (doerwalter) Date: 2005-08-18 22:17 Message: Logged In: YES user_id=89016 The patch has a problem with input strings of a length that is not a multiple of 4, e.g. "\x00".decode("unicode-internal") returns u"" instead of raising an error. Also in a UCS-2 build most of the tests are irrelevant (as it's not possible to create codepoints above 0x10 even when using surrogates), so probably they should be ifdef'd out. -- Comment By: Nik Haldimann (nhaldimann) Date: 2005-08-05 23:08 Message: Logged In: YES user_id=1317086 Here's the patch with error handler support + test. Again: Please review carefully. -- Comment By: Nik Haldimann (nhaldimann) Date: 2005-08-05 18:35 Message: Logged In: YES user_id=1317086 Ah, that PEP clears some things up for me. I will look into it, but I hope you realize this requires tinkering with unicodeobject.c since the error handler code seems to live there. -- Comment By: Walter Dörwald (doerwalter) Date: 2005-08-05 18:03 Message: Logged In: YES user_id=89016 Your patch doesn't support PEP 293 error handlers. Could you add support for that? -- Comment By: Nik Haldimann (nhaldimann) Date: 2005-08-05 16:50 Message: Logged In: YES user_id=1317086 OK, I put something together. Please review carefully as I'm not very familiar with the C API. I have tested this with the CVS HEAD on OS X and Linux. -- Comment By: M.-A. Lemburg (lemburg) Date: 2005-08-04 16:41 Message: Logged In: YES user_id=38388 I think solution 2 is the right approach, since UCS-4 only has 0x10 possible code points. Could you provide a patch ? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1251300&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/opt
[ python-Bugs-1251300 ] Decoding with unicode_internal segfaults on UCS-4 builds
Bugs item #1251300, was opened at 2005-08-03 21:49 Message generated for change (Comment added) made by doerwalter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1251300&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Unicode Group: Python 2.5 Status: Open Resolution: None Priority: 5 Submitted By: Nik Haldimann (nhaldimann) Assigned to: M.-A. Lemburg (lemburg) Summary: Decoding with unicode_internal segfaults on UCS-4 builds Initial Comment: On UCS-4 builds, decoding a byte string with the unicode_internal codec doesn't correctly work for code points from 0x8000 upwards and even segfaults. I have observed the same behaviour on 2.5 from CVS and 2.4.0 on OS X/PowerPC as well as on 2.3.5 on Linux/x86. Here's an example: Python 2.5a0 (#1, Aug 3 2005, 21:34:05) [GCC 3.3 20030304 (Apple Computer, Inc. build 1671)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> "\x7f\xff\xff\xff".decode("unicode_internal") u'\U7fff' >>> "\x80\x00\x00\x00".decode("unicode_internal") u'\x00' >>> "\x80\x00\x00\x01".decode("unicode_internal") u'\x01' >>> "\x81\x00\x00\x00".decode("unicode_internal") Segmentation fault On little endian architectures the byte strings must be reversed for the same effect. I'm not sure if I understand what's going on, but I guess there are 2 solution strategies: 1. Make unicode_internal work for any code point up to 0x. 2. Make unicode_internal raise a UnicodeDecodeError for anything above 0x10 (== sys.maxunicode for UCS-4 builds). It seems like there are no unicode code points above 0x10, so the latter solution feels more correct to me, even though it might break backwards compatibility a tiny bit. The unicodeescape codec already does a similar thing: >>> u"\U0011" UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-9: illegal Unicode character -- >Comment By: Walter Dörwald (doerwalter) Date: 2005-08-19 17:39 Message: Logged In: YES user_id=89016 The data the handler sees is nonsensical by definition. ;) To get an idea how to handle an incorrect length, take a look at Objects/unicodeobject.c::PyUnicode_DecodeUTF16Stateful() -- Comment By: Nik Haldimann (nhaldimann) Date: 2005-08-19 16:17 Message: Logged In: YES user_id=1317086 I agree about the ifdefs. I'm not sure about how to handle input strings of incorrect length. I guess raising an UnicodeDecodeError is in order. But I think it doesn't make sense to let it pass through the error handler, since the data the handler would see is potentially nonsensical (e.g., the code point value). Can you comment on this? Is it ok to raise a UnicodeDecodeError and skip the error handler here? -- Comment By: Walter Dörwald (doerwalter) Date: 2005-08-18 22:17 Message: Logged In: YES user_id=89016 The patch has a problem with input strings of a length that is not a multiple of 4, e.g. "\x00".decode("unicode-internal") returns u"" instead of raising an error. Also in a UCS-2 build most of the tests are irrelevant (as it's not possible to create codepoints above 0x10 even when using surrogates), so probably they should be ifdef'd out. -- Comment By: Nik Haldimann (nhaldimann) Date: 2005-08-05 23:08 Message: Logged In: YES user_id=1317086 Here's the patch with error handler support + test. Again: Please review carefully. -- Comment By: Nik Haldimann (nhaldimann) Date: 2005-08-05 18:35 Message: Logged In: YES user_id=1317086 Ah, that PEP clears some things up for me. I will look into it, but I hope you realize this requires tinkering with unicodeobject.c since the error handler code seems to live there. -- Comment By: Walter Dörwald (doerwalter) Date: 2005-08-05 18:03 Message: Logged In: YES user_id=89016 Your patch doesn't support PEP 293 error handlers. Could you add support for that? -- Comment By: Nik Haldimann (nhaldimann) Date: 2005-08-05 16:50 Message: Logged In: YES user_id=1317086 OK, I put something together. Please review carefully as I'm not very familiar with the C API. I have tested this with the CVS HEAD on OS X and Linux. -- Comment By: M.-A. Lemburg (lemburg) Date: 2005-08-04 16:41 Message: Logged In: YES user_id=38388 I think solution 2 is the right approach, since UCS-4 only has 0x10 possibl
[ python-Bugs-1251300 ] Decoding with unicode_internal segfaults on UCS-4 builds
Bugs item #1251300, was opened at 2005-08-03 21:49 Message generated for change (Comment added) made by lemburg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1251300&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Unicode Group: Python 2.5 Status: Open Resolution: None Priority: 5 Submitted By: Nik Haldimann (nhaldimann) >Assigned to: Walter Dörwald (doerwalter) Summary: Decoding with unicode_internal segfaults on UCS-4 builds Initial Comment: On UCS-4 builds, decoding a byte string with the unicode_internal codec doesn't correctly work for code points from 0x8000 upwards and even segfaults. I have observed the same behaviour on 2.5 from CVS and 2.4.0 on OS X/PowerPC as well as on 2.3.5 on Linux/x86. Here's an example: Python 2.5a0 (#1, Aug 3 2005, 21:34:05) [GCC 3.3 20030304 (Apple Computer, Inc. build 1671)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> "\x7f\xff\xff\xff".decode("unicode_internal") u'\U7fff' >>> "\x80\x00\x00\x00".decode("unicode_internal") u'\x00' >>> "\x80\x00\x00\x01".decode("unicode_internal") u'\x01' >>> "\x81\x00\x00\x00".decode("unicode_internal") Segmentation fault On little endian architectures the byte strings must be reversed for the same effect. I'm not sure if I understand what's going on, but I guess there are 2 solution strategies: 1. Make unicode_internal work for any code point up to 0x. 2. Make unicode_internal raise a UnicodeDecodeError for anything above 0x10 (== sys.maxunicode for UCS-4 builds). It seems like there are no unicode code points above 0x10, so the latter solution feels more correct to me, even though it might break backwards compatibility a tiny bit. The unicodeescape codec already does a similar thing: >>> u"\U0011" UnicodeDecodeError: 'unicodeescape' codec can't decode bytes in position 0-9: illegal Unicode character -- >Comment By: M.-A. Lemburg (lemburg) Date: 2005-08-19 17:45 Message: Logged In: YES user_id=38388 Assigning to Walter, the error handler expert :-) -- Comment By: Walter Dörwald (doerwalter) Date: 2005-08-19 17:39 Message: Logged In: YES user_id=89016 The data the handler sees is nonsensical by definition. ;) To get an idea how to handle an incorrect length, take a look at Objects/unicodeobject.c::PyUnicode_DecodeUTF16Stateful() -- Comment By: Nik Haldimann (nhaldimann) Date: 2005-08-19 16:17 Message: Logged In: YES user_id=1317086 I agree about the ifdefs. I'm not sure about how to handle input strings of incorrect length. I guess raising an UnicodeDecodeError is in order. But I think it doesn't make sense to let it pass through the error handler, since the data the handler would see is potentially nonsensical (e.g., the code point value). Can you comment on this? Is it ok to raise a UnicodeDecodeError and skip the error handler here? -- Comment By: Walter Dörwald (doerwalter) Date: 2005-08-18 22:17 Message: Logged In: YES user_id=89016 The patch has a problem with input strings of a length that is not a multiple of 4, e.g. "\x00".decode("unicode-internal") returns u"" instead of raising an error. Also in a UCS-2 build most of the tests are irrelevant (as it's not possible to create codepoints above 0x10 even when using surrogates), so probably they should be ifdef'd out. -- Comment By: Nik Haldimann (nhaldimann) Date: 2005-08-05 23:08 Message: Logged In: YES user_id=1317086 Here's the patch with error handler support + test. Again: Please review carefully. -- Comment By: Nik Haldimann (nhaldimann) Date: 2005-08-05 18:35 Message: Logged In: YES user_id=1317086 Ah, that PEP clears some things up for me. I will look into it, but I hope you realize this requires tinkering with unicodeobject.c since the error handler code seems to live there. -- Comment By: Walter Dörwald (doerwalter) Date: 2005-08-05 18:03 Message: Logged In: YES user_id=89016 Your patch doesn't support PEP 293 error handlers. Could you add support for that? -- Comment By: Nik Haldimann (nhaldimann) Date: 2005-08-05 16:50 Message: Logged In: YES user_id=1317086 OK, I put something together. Please review carefully as I'm not very familiar with the C API. I have tested this with the CVS HEAD on OS X and Linux. --
[ python-Bugs-1263656 ] IDLE on Mac
Bugs item #1263656, was opened at 2005-08-18 22:35 Message generated for change (Comment added) made by bsherwood You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1263656&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: IDLE Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Bruce Sherwood (bsherwood) Assigned to: Nobody/Anonymous (nobody) Summary: IDLE on Mac Initial Comment: Copying code from some browsers into IDLE on the Mac can leave the file with only \r (13) at the ends of lines (Safari doesn't seem to have this problem). Then checksyntax() in ScriptBinding.py fails to convert these into \n (10), and compile() fails. The effect is that a program which Python is willing to run gets a syntax error in IDLE. I think the fix is in checksyntax() to add after source = re.sub(r"\r\n", "\n", source) the following statement, which converts unaccompanied \r's into \n's:: source = re.sub(r"\r", "\n", source) I've tried this and it works, but someone with a better overview of end-of-line issues in Python should think through whether this is the appropriate fix. -- >Comment By: Bruce Sherwood (bsherwood) Date: 2005-08-19 12:16 Message: Logged In: YES user_id=34881 I should have said that this is in the environment of running IDLE on Mac OSX 10.4 under X11, using the fink distribution. I should also say that there seem to be issues not only of compiling but also of editing/display. In a browser, click on a .py file, select all the text, copy, paste into IDLE. With Safari, it looks right and it runs. With NetScape, it displays all on one line, and it doesn't run (syntax error). I haven't studied the actual code to see what if anything IDLE does to attempt to detect the nature of text pasted into an edit window, but clearly it's different coming from two popular browsers. -- Comment By: Bruce Sherwood (bsherwood) Date: 2005-08-19 00:23 Message: Logged In: YES user_id=34881 A footnote: Now I don't understand why the substitution searches for r"\r\n", since this would seem to be the raw string which represents slash, r, slash, n, not the two-character string "\r\n"?? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1263656&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1264168 ] PyArg_ParseTupleAndKeywords doesn't handle I format correctl
Bugs item #1264168, was opened at 2005-08-19 10:31 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Interpreter Core Group: None Status: Open Resolution: None Priority: 5 Submitted By: John Finlay (finlay648) Assigned to: Nobody/Anonymous (nobody) Summary: PyArg_ParseTupleAndKeywords doesn't handle I format correctl Initial Comment: PyArg_ParseTupleAndKeywords fails with the message; "...impossible" when parsing an optional keyword param using "I" format. Using Python 2.3.5 but also observed in Python 2.4.x The problem is a missing "I" handler in the skipitem function. I've attached a proposed patch. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1264168 ] PyArg_ParseTupleAndKeywords doesn't handle I format correctl
Bugs item #1264168, was opened at 2005-08-19 19:31 Message generated for change (Comment added) made by birkenfeld You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Interpreter Core Group: None >Status: Closed >Resolution: Duplicate Priority: 5 Submitted By: John Finlay (finlay648) Assigned to: Nobody/Anonymous (nobody) Summary: PyArg_ParseTupleAndKeywords doesn't handle I format correctl Initial Comment: PyArg_ParseTupleAndKeywords fails with the message; "...impossible" when parsing an optional keyword param using "I" format. Using Python 2.3.5 but also observed in Python 2.4.x The problem is a missing "I" handler in the skipitem function. I've attached a proposed patch. -- >Comment By: Reinhold Birkenfeld (birkenfeld) Date: 2005-08-19 20:18 Message: Logged In: YES user_id=1188172 Duplicate of #893549. See patch #1212928 to fix all missing format codes. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1264168 ] PyArg_ParseTupleAndKeywords doesn't handle I format correctl
Bugs item #1264168, was opened at 2005-08-19 10:31 Message generated for change (Comment added) made by finlay648 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Interpreter Core Group: None Status: Closed Resolution: Duplicate Priority: 5 Submitted By: John Finlay (finlay648) Assigned to: Nobody/Anonymous (nobody) Summary: PyArg_ParseTupleAndKeywords doesn't handle I format correctl Initial Comment: PyArg_ParseTupleAndKeywords fails with the message; "...impossible" when parsing an optional keyword param using "I" format. Using Python 2.3.5 but also observed in Python 2.4.x The problem is a missing "I" handler in the skipitem function. I've attached a proposed patch. -- >Comment By: John Finlay (finlay648) Date: 2005-08-19 11:40 Message: Logged In: YES user_id=1331852 And how would one look up a bug by bug number since there appears to be no obvious way to do this. -- Comment By: Reinhold Birkenfeld (birkenfeld) Date: 2005-08-19 11:18 Message: Logged In: YES user_id=1188172 Duplicate of #893549. See patch #1212928 to fix all missing format codes. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1264168 ] PyArg_ParseTupleAndKeywords doesn't handle I format correctl
Bugs item #1264168, was opened at 2005-08-19 19:31 Message generated for change (Comment added) made by birkenfeld You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Interpreter Core Group: None Status: Closed Resolution: Duplicate Priority: 5 Submitted By: John Finlay (finlay648) Assigned to: Nobody/Anonymous (nobody) Summary: PyArg_ParseTupleAndKeywords doesn't handle I format correctl Initial Comment: PyArg_ParseTupleAndKeywords fails with the message; "...impossible" when parsing an optional keyword param using "I" format. Using Python 2.3.5 but also observed in Python 2.4.x The problem is a missing "I" handler in the skipitem function. I've attached a proposed patch. -- >Comment By: Reinhold Birkenfeld (birkenfeld) Date: 2005-08-19 20:48 Message: Logged In: YES user_id=1188172 Either order the bugs in SF by ID, or use an URL of the form http://www.python.org/sf/ (number without '#') -- Comment By: John Finlay (finlay648) Date: 2005-08-19 20:40 Message: Logged In: YES user_id=1331852 And how would one look up a bug by bug number since there appears to be no obvious way to do this. -- Comment By: Reinhold Birkenfeld (birkenfeld) Date: 2005-08-19 20:18 Message: Logged In: YES user_id=1188172 Duplicate of #893549. See patch #1212928 to fix all missing format codes. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1264666 ] PEP 8 uses wrong raise syntax
Bugs item #1264666, was opened at 2005-08-20 00:17 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264666&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Documentation Group: Python 2.4 Status: Open Resolution: None Priority: 5 Submitted By: Steven Bethard (bediviere) Assigned to: Nobody/Anonymous (nobody) Summary: PEP 8 uses wrong raise syntax Initial Comment: Despite the recommendation in "Programming Recommendations" to use:: raise ValueError('message') instead of:: raise ValueError, 'message' the PEP itself uses the second form under the "Maximum Line Length" section:: raise ValueError, "sorry, you lose" ... raise ValueError, "I don't think so" -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264666&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com