On Wed, Mar 13, 2013 at 9:11 PM, rusi <rustompm...@gmail.com> wrote: > Uhhh.. > Making the subject line useful for all readers
I should have read this one before replying in the other thread. jmf, I'd like to see evidence that there has been a performance regression compared against a wide build of Python 3.2. You still have never answered this fundamental, that the narrow builds of Python are *BUGGY* in the same way that JavaScript/ECMAScript is. And believe you me, the utterly unnecessary hassles I have had to deal with when permitting user-provided .js code to script my engine have wasted rather more dev hours than you would believe - there are rather a lot of stupid edge cases to deal with. The PEP 393 string is simply a memory-optimized version of UTF-32. It guarantees O(1) indexing and slicing, while still remaining tight in many cases. Its worst case is a constant amount larger than pure UTF-32 (the overhead of recording the string width), its best case is equivalent to ASCII (if all strings are seven-bit). The flexible string representation is not brand new. It has been tested and proven in another language, one very similar to Python; and its performance has been provably sufficient for everyday operations. Pike's string type behaves just as Python 3.3's, and has done for longer than I can trace backward. In terms of Unicode compliance, it is perfect; in terms of performance, quite acceptable; the worst-case operation is taking an ASCII string and overwriting one character in it with an astral character - which Python flat-out doesn't permit, but Pike does, as a known-slow operation. (It triggers a copy of the string, so it's always going to be slow.) There are two broad areas of complaint that you have raised. One is of Unicode compliance and correctness. I believe those complaints are utterly unfounded, and you have yet to show any serious evidence to support them. Py 3.3 is perfectly compliant with everything I have yet checked. The other complaint is of performance, and the issue of being US-centric. While it's true that ASCII and Latin-1 strings will be smaller/faster under Py 3.3 than 3.2, this is not purely to the benefit of the US at the cost of everyone else; it's also a benefit to the myriad non-US programs that use a lot of ASCII strings - for instance, delimiters, HTML tags, builtin function names... all of these are ASCII, even if the rest of the code isn't. And there's no penalty for non-English speakers, when compared against a non-buggy wide build. The very worst case is only a constant factor worse, and that assumes astral characters in every single string... which does not happen, trust me on that. ChrisA -- http://mail.python.org/mailman/listinfo/python-list