[ python-Bugs-1251300 ] Decoding with unicode_internal segfaults on UCS-4 builds

2005-08-19 Thread SourceForge.net
Bugs item #1251300, was opened at 2005-08-03 21:49
Message generated for change (Comment added) made by nhaldimann
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1251300&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Submitted By: Nik Haldimann (nhaldimann)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Decoding with unicode_internal segfaults on UCS-4 builds

Initial Comment:
On UCS-4 builds, decoding a byte string with the
unicode_internal codec doesn't correctly work for code
points from 0x8000 upwards and even segfaults. I
have observed the same behaviour on 2.5 from CVS and
2.4.0 on OS X/PowerPC as well as on 2.3.5 on Linux/x86.
Here's an example:

Python 2.5a0 (#1, Aug  3 2005, 21:34:05) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1671)] on
darwin
Type "help", "copyright", "credits" or "license" for
more information.
>>> "\x7f\xff\xff\xff".decode("unicode_internal")
u'\U7fff'
>>> "\x80\x00\x00\x00".decode("unicode_internal")
u'\x00'
>>> "\x80\x00\x00\x01".decode("unicode_internal")
u'\x01'
>>> "\x81\x00\x00\x00".decode("unicode_internal")
Segmentation fault

On little endian architectures the byte strings must be
reversed for the same effect.

I'm not sure if I understand what's going on, but I
guess there are 2 solution strategies:

1. Make unicode_internal work for any code point up to
0x.

2. Make unicode_internal raise a UnicodeDecodeError for
anything above 0x10 (== sys.maxunicode for UCS-4
builds).

It seems like there are no unicode code points above
0x10, so the latter solution feels more correct to
me, even though it might break backwards compatibility
a tiny bit. The unicodeescape codec already does a
similar thing:

>>> u"\U0011"
UnicodeDecodeError: 'unicodeescape' codec can't decode
bytes in position 0-9: illegal Unicode character


--

>Comment By: Nik Haldimann (nhaldimann)
Date: 2005-08-19 16:17

Message:
Logged In: YES 
user_id=1317086

I agree about the ifdefs. I'm not sure about how to handle
input strings of incorrect length. I guess raising an
UnicodeDecodeError is in order. But I think it doesn't make
sense to let it pass through the error handler, since the
data the handler would see is potentially nonsensical (e.g.,
the code point value). Can you comment on this? Is it ok to
raise a UnicodeDecodeError and skip the error handler here?

--

Comment By: Walter Dörwald (doerwalter)
Date: 2005-08-18 22:17

Message:
Logged In: YES 
user_id=89016

The patch has a problem with input strings of a length that
is not a multiple of 4, e.g.
"\x00".decode("unicode-internal") returns u"" instead of
raising an error. Also in a UCS-2 build most of the tests
are irrelevant (as it's not possible to create codepoints
above 0x10 even when using surrogates), so probably they
should be ifdef'd out.

--

Comment By: Nik Haldimann (nhaldimann)
Date: 2005-08-05 23:08

Message:
Logged In: YES 
user_id=1317086

Here's the patch with error handler support + test. Again:
Please review carefully.

--

Comment By: Nik Haldimann (nhaldimann)
Date: 2005-08-05 18:35

Message:
Logged In: YES 
user_id=1317086

Ah, that PEP clears some things up for me. I will look into
it, but I hope you realize this requires tinkering with
unicodeobject.c since the error handler code seems to live
there.

--

Comment By: Walter Dörwald (doerwalter)
Date: 2005-08-05 18:03

Message:
Logged In: YES 
user_id=89016

Your patch doesn't support PEP 293 error handlers. Could you
add support for that?

--

Comment By: Nik Haldimann (nhaldimann)
Date: 2005-08-05 16:50

Message:
Logged In: YES 
user_id=1317086

OK, I put something together. Please review carefully as I'm
not very familiar with the C API. I have tested this with
the CVS HEAD on OS X and Linux.

--

Comment By: M.-A. Lemburg (lemburg)
Date: 2005-08-04 16:41

Message:
Logged In: YES 
user_id=38388

I think solution 2 is the right approach, since UCS-4 only
has 0x10 possible code points.

Could you provide a patch ?


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1251300&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/opt

[ python-Bugs-1251300 ] Decoding with unicode_internal segfaults on UCS-4 builds

2005-08-19 Thread SourceForge.net
Bugs item #1251300, was opened at 2005-08-03 21:49
Message generated for change (Comment added) made by doerwalter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1251300&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Submitted By: Nik Haldimann (nhaldimann)
Assigned to: M.-A. Lemburg (lemburg)
Summary: Decoding with unicode_internal segfaults on UCS-4 builds

Initial Comment:
On UCS-4 builds, decoding a byte string with the
unicode_internal codec doesn't correctly work for code
points from 0x8000 upwards and even segfaults. I
have observed the same behaviour on 2.5 from CVS and
2.4.0 on OS X/PowerPC as well as on 2.3.5 on Linux/x86.
Here's an example:

Python 2.5a0 (#1, Aug  3 2005, 21:34:05) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1671)] on
darwin
Type "help", "copyright", "credits" or "license" for
more information.
>>> "\x7f\xff\xff\xff".decode("unicode_internal")
u'\U7fff'
>>> "\x80\x00\x00\x00".decode("unicode_internal")
u'\x00'
>>> "\x80\x00\x00\x01".decode("unicode_internal")
u'\x01'
>>> "\x81\x00\x00\x00".decode("unicode_internal")
Segmentation fault

On little endian architectures the byte strings must be
reversed for the same effect.

I'm not sure if I understand what's going on, but I
guess there are 2 solution strategies:

1. Make unicode_internal work for any code point up to
0x.

2. Make unicode_internal raise a UnicodeDecodeError for
anything above 0x10 (== sys.maxunicode for UCS-4
builds).

It seems like there are no unicode code points above
0x10, so the latter solution feels more correct to
me, even though it might break backwards compatibility
a tiny bit. The unicodeescape codec already does a
similar thing:

>>> u"\U0011"
UnicodeDecodeError: 'unicodeescape' codec can't decode
bytes in position 0-9: illegal Unicode character


--

>Comment By: Walter Dörwald (doerwalter)
Date: 2005-08-19 17:39

Message:
Logged In: YES 
user_id=89016

The data the handler sees is nonsensical by definition. ;)
To get an idea how to handle an incorrect length, take a
look at Objects/unicodeobject.c::PyUnicode_DecodeUTF16Stateful()

--

Comment By: Nik Haldimann (nhaldimann)
Date: 2005-08-19 16:17

Message:
Logged In: YES 
user_id=1317086

I agree about the ifdefs. I'm not sure about how to handle
input strings of incorrect length. I guess raising an
UnicodeDecodeError is in order. But I think it doesn't make
sense to let it pass through the error handler, since the
data the handler would see is potentially nonsensical (e.g.,
the code point value). Can you comment on this? Is it ok to
raise a UnicodeDecodeError and skip the error handler here?

--

Comment By: Walter Dörwald (doerwalter)
Date: 2005-08-18 22:17

Message:
Logged In: YES 
user_id=89016

The patch has a problem with input strings of a length that
is not a multiple of 4, e.g.
"\x00".decode("unicode-internal") returns u"" instead of
raising an error. Also in a UCS-2 build most of the tests
are irrelevant (as it's not possible to create codepoints
above 0x10 even when using surrogates), so probably they
should be ifdef'd out.

--

Comment By: Nik Haldimann (nhaldimann)
Date: 2005-08-05 23:08

Message:
Logged In: YES 
user_id=1317086

Here's the patch with error handler support + test. Again:
Please review carefully.

--

Comment By: Nik Haldimann (nhaldimann)
Date: 2005-08-05 18:35

Message:
Logged In: YES 
user_id=1317086

Ah, that PEP clears some things up for me. I will look into
it, but I hope you realize this requires tinkering with
unicodeobject.c since the error handler code seems to live
there.

--

Comment By: Walter Dörwald (doerwalter)
Date: 2005-08-05 18:03

Message:
Logged In: YES 
user_id=89016

Your patch doesn't support PEP 293 error handlers. Could you
add support for that?

--

Comment By: Nik Haldimann (nhaldimann)
Date: 2005-08-05 16:50

Message:
Logged In: YES 
user_id=1317086

OK, I put something together. Please review carefully as I'm
not very familiar with the C API. I have tested this with
the CVS HEAD on OS X and Linux.

--

Comment By: M.-A. Lemburg (lemburg)
Date: 2005-08-04 16:41

Message:
Logged In: YES 
user_id=38388

I think solution 2 is the right approach, since UCS-4 only
has 0x10 possibl

[ python-Bugs-1251300 ] Decoding with unicode_internal segfaults on UCS-4 builds

2005-08-19 Thread SourceForge.net
Bugs item #1251300, was opened at 2005-08-03 21:49
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1251300&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Unicode
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Submitted By: Nik Haldimann (nhaldimann)
>Assigned to: Walter Dörwald (doerwalter)
Summary: Decoding with unicode_internal segfaults on UCS-4 builds

Initial Comment:
On UCS-4 builds, decoding a byte string with the
unicode_internal codec doesn't correctly work for code
points from 0x8000 upwards and even segfaults. I
have observed the same behaviour on 2.5 from CVS and
2.4.0 on OS X/PowerPC as well as on 2.3.5 on Linux/x86.
Here's an example:

Python 2.5a0 (#1, Aug  3 2005, 21:34:05) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1671)] on
darwin
Type "help", "copyright", "credits" or "license" for
more information.
>>> "\x7f\xff\xff\xff".decode("unicode_internal")
u'\U7fff'
>>> "\x80\x00\x00\x00".decode("unicode_internal")
u'\x00'
>>> "\x80\x00\x00\x01".decode("unicode_internal")
u'\x01'
>>> "\x81\x00\x00\x00".decode("unicode_internal")
Segmentation fault

On little endian architectures the byte strings must be
reversed for the same effect.

I'm not sure if I understand what's going on, but I
guess there are 2 solution strategies:

1. Make unicode_internal work for any code point up to
0x.

2. Make unicode_internal raise a UnicodeDecodeError for
anything above 0x10 (== sys.maxunicode for UCS-4
builds).

It seems like there are no unicode code points above
0x10, so the latter solution feels more correct to
me, even though it might break backwards compatibility
a tiny bit. The unicodeescape codec already does a
similar thing:

>>> u"\U0011"
UnicodeDecodeError: 'unicodeescape' codec can't decode
bytes in position 0-9: illegal Unicode character


--

>Comment By: M.-A. Lemburg (lemburg)
Date: 2005-08-19 17:45

Message:
Logged In: YES 
user_id=38388

Assigning to Walter, the error handler expert :-)

--

Comment By: Walter Dörwald (doerwalter)
Date: 2005-08-19 17:39

Message:
Logged In: YES 
user_id=89016

The data the handler sees is nonsensical by definition. ;)
To get an idea how to handle an incorrect length, take a
look at Objects/unicodeobject.c::PyUnicode_DecodeUTF16Stateful()

--

Comment By: Nik Haldimann (nhaldimann)
Date: 2005-08-19 16:17

Message:
Logged In: YES 
user_id=1317086

I agree about the ifdefs. I'm not sure about how to handle
input strings of incorrect length. I guess raising an
UnicodeDecodeError is in order. But I think it doesn't make
sense to let it pass through the error handler, since the
data the handler would see is potentially nonsensical (e.g.,
the code point value). Can you comment on this? Is it ok to
raise a UnicodeDecodeError and skip the error handler here?

--

Comment By: Walter Dörwald (doerwalter)
Date: 2005-08-18 22:17

Message:
Logged In: YES 
user_id=89016

The patch has a problem with input strings of a length that
is not a multiple of 4, e.g.
"\x00".decode("unicode-internal") returns u"" instead of
raising an error. Also in a UCS-2 build most of the tests
are irrelevant (as it's not possible to create codepoints
above 0x10 even when using surrogates), so probably they
should be ifdef'd out.

--

Comment By: Nik Haldimann (nhaldimann)
Date: 2005-08-05 23:08

Message:
Logged In: YES 
user_id=1317086

Here's the patch with error handler support + test. Again:
Please review carefully.

--

Comment By: Nik Haldimann (nhaldimann)
Date: 2005-08-05 18:35

Message:
Logged In: YES 
user_id=1317086

Ah, that PEP clears some things up for me. I will look into
it, but I hope you realize this requires tinkering with
unicodeobject.c since the error handler code seems to live
there.

--

Comment By: Walter Dörwald (doerwalter)
Date: 2005-08-05 18:03

Message:
Logged In: YES 
user_id=89016

Your patch doesn't support PEP 293 error handlers. Could you
add support for that?

--

Comment By: Nik Haldimann (nhaldimann)
Date: 2005-08-05 16:50

Message:
Logged In: YES 
user_id=1317086

OK, I put something together. Please review carefully as I'm
not very familiar with the C API. I have tested this with
the CVS HEAD on OS X and Linux.

--

[ python-Bugs-1263656 ] IDLE on Mac

2005-08-19 Thread SourceForge.net
Bugs item #1263656, was opened at 2005-08-18 22:35
Message generated for change (Comment added) made by bsherwood
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1263656&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: IDLE
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Bruce Sherwood (bsherwood)
Assigned to: Nobody/Anonymous (nobody)
Summary: IDLE on Mac

Initial Comment:
Copying code from some browsers into IDLE on the Mac
can leave the file with only \r (13) at the ends of
lines (Safari doesn't seem to have this problem). Then
checksyntax() in ScriptBinding.py fails to convert
these into \n (10), and compile() fails. The effect is
that a program which Python is willing to run gets a
syntax error in IDLE. I think the fix is in
checksyntax() to add after

source = re.sub(r"\r\n", "\n", source)

the following statement, which converts unaccompanied
\r's into \n's::

source = re.sub(r"\r", "\n", source)

I've tried this and it works, but someone with a better
overview of end-of-line issues in Python should think
through whether this is the appropriate fix.

--

>Comment By: Bruce Sherwood (bsherwood)
Date: 2005-08-19 12:16

Message:
Logged In: YES 
user_id=34881

I should have said that this is in the environment of
running IDLE on Mac OSX 10.4 under X11, using the fink
distribution. I should also say that there seem to be issues
not only of compiling but also of editing/display. In a
browser, click on a .py file, select all the text, copy,
paste into IDLE. With Safari, it looks right and it runs.
With NetScape, it displays all on one line, and it doesn't
run (syntax error). I haven't studied the actual code to see
what if anything IDLE does to attempt to detect the nature
of text pasted into an edit window, but clearly it's
different coming from two popular browsers.

--

Comment By: Bruce Sherwood (bsherwood)
Date: 2005-08-19 00:23

Message:
Logged In: YES 
user_id=34881

A footnote: Now I don't understand why the substitution
searches for r"\r\n", since this would seem to be the raw
string which represents slash, r, slash, n, not the
two-character string "\r\n"??

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1263656&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1264168 ] PyArg_ParseTupleAndKeywords doesn't handle I format correctl

2005-08-19 Thread SourceForge.net
Bugs item #1264168, was opened at 2005-08-19 10:31
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: John Finlay (finlay648)
Assigned to: Nobody/Anonymous (nobody)
Summary: PyArg_ParseTupleAndKeywords doesn't handle I format correctl

Initial Comment:
PyArg_ParseTupleAndKeywords fails with the message;
"...impossible"
when parsing an optional keyword param using "I" format.

Using Python 2.3.5 but also observed in Python 2.4.x

The problem is a missing "I" handler in the skipitem
function.

I've attached a proposed patch.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1264168 ] PyArg_ParseTupleAndKeywords doesn't handle I format correctl

2005-08-19 Thread SourceForge.net
Bugs item #1264168, was opened at 2005-08-19 19:31
Message generated for change (Comment added) made by birkenfeld
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: None
>Status: Closed
>Resolution: Duplicate
Priority: 5
Submitted By: John Finlay (finlay648)
Assigned to: Nobody/Anonymous (nobody)
Summary: PyArg_ParseTupleAndKeywords doesn't handle I format correctl

Initial Comment:
PyArg_ParseTupleAndKeywords fails with the message;
"...impossible"
when parsing an optional keyword param using "I" format.

Using Python 2.3.5 but also observed in Python 2.4.x

The problem is a missing "I" handler in the skipitem
function.

I've attached a proposed patch.

--

>Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-08-19 20:18

Message:
Logged In: YES 
user_id=1188172

Duplicate of #893549. See patch #1212928 to fix all missing
format codes.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1264168 ] PyArg_ParseTupleAndKeywords doesn't handle I format correctl

2005-08-19 Thread SourceForge.net
Bugs item #1264168, was opened at 2005-08-19 10:31
Message generated for change (Comment added) made by finlay648
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: None
Status: Closed
Resolution: Duplicate
Priority: 5
Submitted By: John Finlay (finlay648)
Assigned to: Nobody/Anonymous (nobody)
Summary: PyArg_ParseTupleAndKeywords doesn't handle I format correctl

Initial Comment:
PyArg_ParseTupleAndKeywords fails with the message;
"...impossible"
when parsing an optional keyword param using "I" format.

Using Python 2.3.5 but also observed in Python 2.4.x

The problem is a missing "I" handler in the skipitem
function.

I've attached a proposed patch.

--

>Comment By: John Finlay (finlay648)
Date: 2005-08-19 11:40

Message:
Logged In: YES 
user_id=1331852

And how would one look up a bug by bug number since there
appears to be no obvious way to do this.

--

Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-08-19 11:18

Message:
Logged In: YES 
user_id=1188172

Duplicate of #893549. See patch #1212928 to fix all missing
format codes.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1264168 ] PyArg_ParseTupleAndKeywords doesn't handle I format correctl

2005-08-19 Thread SourceForge.net
Bugs item #1264168, was opened at 2005-08-19 19:31
Message generated for change (Comment added) made by birkenfeld
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
Group: None
Status: Closed
Resolution: Duplicate
Priority: 5
Submitted By: John Finlay (finlay648)
Assigned to: Nobody/Anonymous (nobody)
Summary: PyArg_ParseTupleAndKeywords doesn't handle I format correctl

Initial Comment:
PyArg_ParseTupleAndKeywords fails with the message;
"...impossible"
when parsing an optional keyword param using "I" format.

Using Python 2.3.5 but also observed in Python 2.4.x

The problem is a missing "I" handler in the skipitem
function.

I've attached a proposed patch.

--

>Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-08-19 20:48

Message:
Logged In: YES 
user_id=1188172

Either order the bugs in SF by ID, or use an URL of the form

http://www.python.org/sf/

(number without '#')

--

Comment By: John Finlay (finlay648)
Date: 2005-08-19 20:40

Message:
Logged In: YES 
user_id=1331852

And how would one look up a bug by bug number since there
appears to be no obvious way to do this.

--

Comment By: Reinhold Birkenfeld (birkenfeld)
Date: 2005-08-19 20:18

Message:
Logged In: YES 
user_id=1188172

Duplicate of #893549. See patch #1212928 to fix all missing
format codes.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264168&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1264666 ] PEP 8 uses wrong raise syntax

2005-08-19 Thread SourceForge.net
Bugs item #1264666, was opened at 2005-08-20 00:17
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264666&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Documentation
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Steven Bethard (bediviere)
Assigned to: Nobody/Anonymous (nobody)
Summary: PEP 8 uses wrong raise syntax

Initial Comment:
Despite the recommendation in "Programming
Recommendations" to use::

raise ValueError('message')

instead of::

raise ValueError, 'message'

the PEP itself uses the second form under the "Maximum
Line Length" section::

raise ValueError, "sorry, you lose"
...
raise ValueError, "I don't think so"


--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1264666&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com