Guido van Rossum <gu...@python.org> added the comment:

Wow.  A very educational discussion.  We will be referencing this issue for 
many years to come.

As long as the buck stops with me, I feel strongly that *today* changing 
indexing from O(1) to O(log N) is a bad idea, partly for technical reasons, 
partly because the Python culture isn't ready.  In 5 or 10 years we need to 
revisit this, and it wouldn't hurt if in the mean time we started seriously 
thinking about how to change our APIs so that O(1) indexing is not relied upon 
so much.  This may include rewriting tutorials to nudge users in the direction 
of using different idioms for text processing.

In the meantime, I think our best option is to switch CPython to the PEP 393 
string implementation.  Despite its disadvantages (I understand the "spoiler" 
issue) is is generally no worse than a wide build, and there is working code 
today that we can optimize before 3.3 is released.

For Python implementations where this is not an option (I'm thinking Jython and 
IronPython, both of which are closely tied to a system string type that behaves 
like UTF-16) I hope that at least the regular expression behavior can be fixed 
so that "." matches a surrogate pair.  (Possibly they already behave that way, 
if they use a native regex library.)

In all cases, for future Python versions, we should tighten the codecs to 
reject data that the Unicode standard considers invalid (and we should offer 
separate non-strict codecs for situations where such invalid data needs to be 
processed).

I wish we could fix the codecs and the regex "." issue on narrow builds for 
Python versions before 3.3 (esp. 3.2 and 2.7), but I fear that this is 
considered too backwards incompatible (though for each specific fix we should 
consider this carefully).

----------
nosy: +gvanrossum

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue12729>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to