Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Antoine Pitrou
On Wed, 2 Nov 2011 19:41:30 -0700
Guido van Rossum  wrote:
> Apparently Macports is still using a buggy compiler.

If I understand things correctly, this is technically not a buggy
compiler but Python making optimistic assumptions about the C standard.
(from issue11149: "clang (as with gcc 4.x) assumes signed integer
overflow is undefined. But Python depends on the fact that signed
integer overflow wraps")

I'd happily call that a buggy C standard, though :-)

Regards

Antoine.


> I reported a
> similar issue before and got this reply from Ned Delly:
> 
> """
> Thanks for the pointer.  That looks like a duplicate of Issue11149 (and
> Issue12701).  Another manifestation of this was reported in Issue13061
> which also originated from MacPorts.  I'll remind them that the
> configure change is likely needed for all Pythons.  It's still safest to
> stick with good old gcc-4.2 on OS X at the moment.
> """
> 
> (Those issues are on bugs.python.org.)
> 
> --Guido
> 
> On Wed, Nov 2, 2011 at 7:32 PM, Derek Shockey  wrote:
> > I just found an unexpected behavior and I'm wondering if it is a bug.
> > In my 2.7.2 interpreter on OS X, built and installed via MacPorts, it
> > appears that integers are not correctly overflowing into longs and
> > instead are yielding bizarre results. I can only reproduce this when
> > using the exponent operator with two ints (declaring either operand
> > explicitly as long prevents the behavior).
> >
>  2**100
> > 0
>  2**100L
> > 1267650600228229401496703205376L
> >
>  20**20
> > -2101438300051996672
>  20L**20
> > 1048576L
> >
>  10**20
> > 7766279631452241920
>  10L**20L
> > 1L
> >
> > To confirm I'm not crazy, I tried in the 2.7.1 and 2.6.7 installations
> > included in OS X 10.7, and also a 2.7.2+ (not sure what the + is) on
> > an Ubuntu machine and didn't see this behavior. This looks like some
> > kind of truncation error, but I don't know much about the internals of
> > Python and have no idea what's going on. I assume since it's only in
> > my MacPorts installation, it must be build configuration issue that is
> > specific to OS X, perhaps only 10.7, or MacPorts.
> >
> > Am I doing something wrong, and is there a way to fix it before I
> > compile? I could find any references to this problem as a known issue.
> >
> > Thanks,
> > Derek
> > ___
> > Python-Dev mailing list
> > [email protected]
> > http://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe: 
> > http://mail.python.org/mailman/options/python-dev/guido%40python.org
> >
> 
> 
> 
> -- 
> --Guido van Rossum (python.org/~guido)


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Victor Stinner
Le Mercredi 2 Novembre 2011 19:32:38 Derek Shockey a écrit :
> I just found an unexpected behavior and I'm wondering if it is a bug.
> In my 2.7.2 interpreter on OS X, built and installed via MacPorts, it
> appears that integers are not correctly overflowing into longs and
> instead are yielding bizarre results. I can only reproduce this when
> using the exponent operator with two ints (declaring either operand
> explicitly as long prevents the behavior).
> 
> >>> 2**100
> 
> 0

This issue has already been fixed twice in Python 2.7 branch: int_pow() has 
been fixed and -fwrapv is now used for Clang.

http://bugs.python.org/issue11149
http://bugs.python.org/issue12973

It is maybe time for a new release? :-)

Victor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] draft PEP: virtual environments

2011-11-03 Thread VanL

For what its worth

On 11/1/2011 11:43 AM, Paul Moore wrote:

On 1 November 2011 16:40, Paul Moore  wrote:

On 1 November 2011 16:29, Paul Moore  wrote:

On 31 October 2011 20:10, Carl Meyer  wrote:

For Windows, can you point me at the nt scripts? If they aren't too
complex, I'd be willing to port to Powershell.


For what its worth, there have been a number of efforts in this direction:

https://bitbucket.org/guillermooo/virtualenvwrapper-powershell
https://bitbucket.org/vanl/virtualenvwrapper-powershell

(Both different implementations)

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot failures

2011-11-03 Thread Brian Curtin
On Sat, Oct 22, 2011 at 14:30, Andrea Crotti  wrote:
> On 10/21/2011 10:08 PM, Antoine Pitrou wrote:
>>
>> Hello,
>>
>> There are currently a bunch of various buildbot failures on all 3
>> branches. I would remind committers to regularly take a look at the
>> buildbots, so that these failures get solved reasonably fast.
>>
>> Regards
>>
>> Antoine.
>
> In my previous workplace if someone broke a build committing something wrong
> he/she
> had to bring cake for everyone next meeting.
>
> The cake is not really feasible I guess, but isn't it possible to notify the
> developer that
> broke the build?

You just have to keep track and bring all of the cakes that you owe to PyCon.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Éric Araujo
Hi Derek,

> I tried in the 2.7.1 and 2.6.7 installations included in OS X 10.7,
> and also a 2.7.2+ (not sure what the + is)

The + means that’s it’s 2.7.2 + some commits, in other words the
in-development version that will become 2.7.3.  This bit of info seems
to be missing from the doc.

Regards
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Unicode exception indexing

2011-11-03 Thread martin

There is a backwards compatibility issue with PEP 393 and Unicode exceptions:
the start and end indices: are they Py_UNICODE indices, or code point indices?

On the one hand, these indices are used in formatting error messages such as
"codec can't encode character \u%04x in position %d", suggesting they  
are regular

indices into the string (counting code points).

On the other hand, they are used by error handlers to lookup the character,
and existing error handlers (including the ones we have now) use
PyUnicode_AsUnicode to find the character. This suggests that the indices
should be Py_UNICODE indices, for compatibility (and they currently do
work in this way).

The indices can only be different if the string is an UCS-4 string, and
Py_UNICODE is a two-byte type (i.e. on Windows).

So what should it be?

As a compromise, it would be possible to convert between these indices,
by counting the non-BMP characters that precede the index if the indices
might differ. That would be expensive to compute, but provide backwards
compatibility to the C API. It's less clear what backwards compatibility
to Python code would require - most likely, people would use the indices
for slicing operations (rather than performing an UTF-16 conversion and
performing indexing on that).

Regards,
Martin



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Derek Shockey
I believe you're right. The 2.7.2 MacPorts portfile definitely passes
the -fwrapv flag to clang, but the bad behavior still occurs with
exponents. I verified the current head of the 2.7 branch does not have
this problem when built with clang, so I'm assuming that issue12973
resolved this with a patch to int_pow() and that it will be out in the
next release.

-Derek

On Thu, Nov 3, 2011 at 4:30 AM, Antoine Pitrou  wrote:
> On Wed, 2 Nov 2011 19:41:30 -0700
> Guido van Rossum  wrote:
>> Apparently Macports is still using a buggy compiler.
>
> If I understand things correctly, this is technically not a buggy
> compiler but Python making optimistic assumptions about the C standard.
> (from issue11149: "clang (as with gcc 4.x) assumes signed integer
> overflow is undefined. But Python depends on the fact that signed
> integer overflow wraps")
>
> I'd happily call that a buggy C standard, though :-)
>
> Regards
>
> Antoine.
>
>
>> I reported a
>> similar issue before and got this reply from Ned Delly:
>>
>> """
>> Thanks for the pointer.  That looks like a duplicate of Issue11149 (and
>> Issue12701).  Another manifestation of this was reported in Issue13061
>> which also originated from MacPorts.  I'll remind them that the
>> configure change is likely needed for all Pythons.  It's still safest to
>> stick with good old gcc-4.2 on OS X at the moment.
>> """
>>
>> (Those issues are on bugs.python.org.)
>>
>> --Guido
>>
>> On Wed, Nov 2, 2011 at 7:32 PM, Derek Shockey  
>> wrote:
>> > I just found an unexpected behavior and I'm wondering if it is a bug.
>> > In my 2.7.2 interpreter on OS X, built and installed via MacPorts, it
>> > appears that integers are not correctly overflowing into longs and
>> > instead are yielding bizarre results. I can only reproduce this when
>> > using the exponent operator with two ints (declaring either operand
>> > explicitly as long prevents the behavior).
>> >
>>  2**100
>> > 0
>>  2**100L
>> > 1267650600228229401496703205376L
>> >
>>  20**20
>> > -2101438300051996672
>>  20L**20
>> > 1048576L
>> >
>>  10**20
>> > 7766279631452241920
>>  10L**20L
>> > 1L
>> >
>> > To confirm I'm not crazy, I tried in the 2.7.1 and 2.6.7 installations
>> > included in OS X 10.7, and also a 2.7.2+ (not sure what the + is) on
>> > an Ubuntu machine and didn't see this behavior. This looks like some
>> > kind of truncation error, but I don't know much about the internals of
>> > Python and have no idea what's going on. I assume since it's only in
>> > my MacPorts installation, it must be build configuration issue that is
>> > specific to OS X, perhaps only 10.7, or MacPorts.
>> >
>> > Am I doing something wrong, and is there a way to fix it before I
>> > compile? I could find any references to this problem as a known issue.
>> >
>> > Thanks,
>> > Derek
>> > ___
>> > Python-Dev mailing list
>> > [email protected]
>> > http://mail.python.org/mailman/listinfo/python-dev
>> > Unsubscribe: 
>> > http://mail.python.org/mailman/options/python-dev/guido%40python.org
>> >
>>
>>
>>
>> --
>> --Guido van Rossum (python.org/~guido)
>
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/derek.shockey%40gmail.com
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Stefan Krah
Derek Shockey  wrote:
> I believe you're right. The 2.7.2 MacPorts portfile definitely passes
> the -fwrapv flag to clang, but the bad behavior still occurs with
> exponents.

Really? Even without the fix for issue12973 the -fwrapv flag
should be sufficient, as reported in issue13061 and Issue11149.

For clang version 3.0 (trunk 139691) on FreeBSD this is the case.


Stefan Krah


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Victor Stinner
Le jeudi 3 novembre 2011 18:14:42, [email protected] a écrit :
> There is a backwards compatibility issue with PEP 393 and Unicode
> exceptions: the start and end indices: are they Py_UNICODE indices, or
> code point indices?

Oh oh. That's exactly why I didn't want to start to work on this issue.
http://bugs.python.org/issue13064

In a Python error handler, exc.object[exc.start:exc.end] should be used to get 
the unencodable/undecodable substring.

In a C error handler, it depends if you use a Py_UNICODE* pointer or 
PyUnicode_Substring() / PyUnicode_READ.

Using google.fr/codesearch, I found some user error handlers implemented in 
Python:
 * straw: "html_replace"
 * Nuxeo: "latin9_fallback"
 * peerscape: "htmlentityescape"
 * pymt: "cssescape"
 * 

I found no error implemented in C (not any call to PyCodec_RegisterError).

> So what should it be?

I suggest to use code point indices. Code point indices is also now more 
"natural" with the PEP 393.

Because it is an incompatible change, it should be documented in the PEP and 
in the "What's new in Python 3.3" document.

> As a compromise, it would be possible to convert between these indices,
> by counting the non-BMP characters that precede the index if the indices
> might differ.

I started such hack for the UTF-8 codec... It is really tricky, we should not 
do that!

> That would be expensive to compute

Yeah, O(n) should be avoided when is it possible.

--

FYI I implemented a proof-of-concept in Python of the surrogateescape error 
handler for Python 2 (for Mercurial):
https://bitbucket.org/haypo/misc/src/tip/python/surrogateescape.py

Victor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Buildbot failures

2011-11-03 Thread Stefan Behnel

Brian Curtin, 03.11.2011 15:59:

On Sat, Oct 22, 2011 at 14:30, Andrea Crotti wrote:

On 10/21/2011 10:08 PM, Antoine Pitrou wrote:


Hello,

There are currently a bunch of various buildbot failures on all 3
branches. I would remind committers to regularly take a look at the
buildbots, so that these failures get solved reasonably fast.

Regards

Antoine.


In my previous workplace if someone broke a build committing something wrong
he/she
had to bring cake for everyone next meeting.

The cake is not really feasible I guess, but isn't it possible to notify the
developer that
broke the build?


You just have to keep track and bring all of the cakes that you owe to PyCon.


Did you mean "PieCon"?

Stefan

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Antoine Pitrou
On Thu, 03 Nov 2011 18:14:42 +0100
[email protected] wrote:
> There is a backwards compatibility issue with PEP 393 and Unicode exceptions:
> the start and end indices: are they Py_UNICODE indices, or code point indices?
> 
> On the one hand, these indices are used in formatting error messages such as
> "codec can't encode character \u%04x in position %d", suggesting they  
> are regular
> indices into the string (counting code points).
> 
> On the other hand, they are used by error handlers to lookup the character,
> and existing error handlers (including the ones we have now) use
> PyUnicode_AsUnicode to find the character. This suggests that the indices
> should be Py_UNICODE indices, for compatibility (and they currently do
> work in this way).

But what about error handlers written in Python?

> The indices can only be different if the string is an UCS-4 string, and
> Py_UNICODE is a two-byte type (i.e. on Windows).
> 
> So what should it be?

I'd say let's do the Right Thing and accept the small compatibility
breach (surrogates on UCS-2 builds).

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] ints not overflowing into longs?

2011-11-03 Thread Derek Shockey
You're right; among my many tests I think I muddled the situation with
a stray CFLAGS variable in my environment. Apologies for the
misinformation. The current MacPorts portfile does not add -fwrapv.
Adding -fwrapv to OPT in the Makefile solves the problem. I confirmed
by manually building the v2.7.2 tag with clang and -fwrapv, and the
overflow behavior is correct. I've notified the MacPorts package
maintainer.


-Derek

On Thu, Nov 3, 2011 at 11:07 AM, Stefan Krah  wrote:
> Derek Shockey  wrote:
>> I believe you're right. The 2.7.2 MacPorts portfile definitely passes
>> the -fwrapv flag to clang, but the bad behavior still occurs with
>> exponents.
>
> Really? Even without the fix for issue12973 the -fwrapv flag
> should be sufficient, as reported in issue13061 and Issue11149.
>
> For clang version 3.0 (trunk 139691) on FreeBSD this is the case.
>
>
> Stefan Krah
>
>
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/derek.shockey%40gmail.com
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Guido van Rossum
On Thu, Nov 3, 2011 at 12:29 PM, Antoine Pitrou  wrote:
> On Thu, 03 Nov 2011 18:14:42 +0100
> [email protected] wrote:
>> There is a backwards compatibility issue with PEP 393 and Unicode exceptions:
>> the start and end indices: are they Py_UNICODE indices, or code point 
>> indices?
>>
>> On the one hand, these indices are used in formatting error messages such as
>> "codec can't encode character \u%04x in position %d", suggesting they
>> are regular
>> indices into the string (counting code points).
>>
>> On the other hand, they are used by error handlers to lookup the character,
>> and existing error handlers (including the ones we have now) use
>> PyUnicode_AsUnicode to find the character. This suggests that the indices
>> should be Py_UNICODE indices, for compatibility (and they currently do
>> work in this way).
>
> But what about error handlers written in Python?
>
>> The indices can only be different if the string is an UCS-4 string, and
>> Py_UNICODE is a two-byte type (i.e. on Windows).
>>
>> So what should it be?
>
> I'd say let's do the Right Thing and accept the small compatibility
> breach (surrogates on UCS-2 builds).

+1

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Terry Reedy

On 11/3/2011 3:16 PM, Victor Stinner wrote:

Le jeudi 3 novembre 2011 18:14:42, [email protected] a écrit :

There is a backwards compatibility issue with PEP 393 and Unicode
exceptions: the start and end indices: are they Py_UNICODE indices, or
code point indices?


I had the impression that we were abolishing the wide versus narrow 
build difference and that this issue would disappear. I must have missed 
something.



So what should it be?


I suggest to use code point indices. Code point indices is also now more
"natural" with the PEP 393.


I think we should look forward, not backwards. Error messages are 
defined as undefined ;-). So I think we should do what is right for the 
new implementation. I suspect that means that I am agreeing with both 
Victor and Antoine.



Because it is an incompatible change, it should be documented in the PEP and
in the "What's new in Python 3.3" document.

...

Yeah, O(n) should be avoided when is it possible.


Definitely to both.

--
Terry Jan Reedy


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Martin v. Löwis
Am 03.11.2011 22:19, schrieb Terry Reedy:
> On 11/3/2011 3:16 PM, Victor Stinner wrote:
>> Le jeudi 3 novembre 2011 18:14:42, [email protected] a écrit :
>>> There is a backwards compatibility issue with PEP 393 and Unicode
>>> exceptions: the start and end indices: are they Py_UNICODE indices, or
>>> code point indices?
> 
> I had the impression that we were abolishing the wide versus narrow
> build difference and that this issue would disappear. I must have missed
> something.

Most certainly. The Py_UNICODE type continues to exist for backwards
compatibility. It is now always a typedef for wchar_t, which makes it
a 16-bit type on Windows.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Martin v. Löwis
>> On the one hand, these indices are used in formatting error messages such as
>> "codec can't encode character \u%04x in position %d", suggesting they  
>> are regular
>> indices into the string (counting code points).
>>
>> On the other hand, they are used by error handlers to lookup the character,
>> and existing error handlers (including the ones we have now) use
>> PyUnicode_AsUnicode to find the character. This suggests that the indices
>> should be Py_UNICODE indices, for compatibility (and they currently do
>> work in this way).
> 
> But what about error handlers written in Python?

I'm working on a patch where an C error handler using
PyUnicodeEncodeError_GetStart gets a different value than a Python
error handler accessing .start. The _GetStart/_GetEnd functions would
take the value from the exception object, and adjust it before returning
it.

The implementation is fairly straight-forward, just a little expensive
(in the case of non-BMP strings on Windows).

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Martin v. Löwis
> I started such hack for the UTF-8 codec... It is really tricky, we should not 
> do that!

With the proper encapsulation, it's not that tricky. I have written
functions PyUnicode_IndexToWCharIndex and PyUnicode_WCharIndexToIndex,
and PyUnicodeEncodeError_GetStart and friends would use that function.
I'd also need new functions PyUnicodeEncodeError_GetStartIndex to access
the "true" start field.

>> That would be expensive to compute
> 
> Yeah, O(n) should be avoided when is it possible.

Ok. I'll wait half a day or so for people to reconsider (now knowing
that it's actually feasible to be fully backwards compatible); if nobody
speaks up, I go ahead and accept the breakage.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Terry Reedy



On 11/3/2011 5:43 PM, "Martin v. Löwis" wrote:


I had the impression that we were abolishing the wide versus narrow
build difference and that this issue would disappear. I must have missed
something.


Most certainly. The Py_UNICODE type continues to exist for backwards
compatibility. It is now always a typedef for wchar_t, which makes it
a 16-bit type on Windows.


Thank you for answering: My revised impression now is that any string I 
create with Python code in Python 3.3+ (as distributed, without 
extensions or ctypes calls) will use the new implementation and will 
index and and slice correctly, even with extended chars. So indexing is 
only an issue for those writing or using C-coded extensions with the old 
unicode C-API on systems with a 16-bit wchar_t. Correct?


---
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Nick Coghlan
Your approach (doing the right thing for both Python and C, new API to
avoid the C performance problem) sounds good to me.

--
Nick Coghlan (via Gmail on Android, so likely to be more terse than usual)
On Nov 4, 2011 7:58 AM, Martin v. Löwis  wrote:

> > I started such hack for the UTF-8 codec... It is really tricky, we
> should not
> > do that!
>
> With the proper encapsulation, it's not that tricky. I have written
> functions PyUnicode_IndexToWCharIndex and PyUnicode_WCharIndexToIndex,
> and PyUnicodeEncodeError_GetStart and friends would use that function.
> I'd also need new functions PyUnicodeEncodeError_GetStartIndex to access
> the "true" start field.
>
> >> That would be expensive to compute
> >
> > Yeah, O(n) should be avoided when is it possible.
>
> Ok. I'll wait half a day or so for people to reconsider (now knowing
> that it's actually feasible to be fully backwards compatible); if nobody
> speaks up, I go ahead and accept the breakage.
>
> Regards,
> Martin
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode exception indexing

2011-11-03 Thread Antoine Pitrou
On Thu, 03 Nov 2011 22:47:00 +0100
"Martin v. Löwis"  wrote:

> >> On the one hand, these indices are used in formatting error messages such 
> >> as
> >> "codec can't encode character \u%04x in position %d", suggesting they  
> >> are regular
> >> indices into the string (counting code points).
> >>
> >> On the other hand, they are used by error handlers to lookup the character,
> >> and existing error handlers (including the ones we have now) use
> >> PyUnicode_AsUnicode to find the character. This suggests that the indices
> >> should be Py_UNICODE indices, for compatibility (and they currently do
> >> work in this way).
> > 
> > But what about error handlers written in Python?
> 
> I'm working on a patch where an C error handler using
> PyUnicodeEncodeError_GetStart gets a different value than a Python
> error handler accessing .start. The _GetStart/_GetEnd functions would
> take the value from the exception object, and adjust it before returning
> it.

Is it worth the hassle? We can just port our existing error handlers,
and I guess the few third-party error handlers written in C (if any)
can bear the transition.

Regards

Antoine.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com