On Mon, Aug 20, 2012 at 3:34 AM, Terry Reedy <tjre...@udel.edu> wrote: > On 8/19/2012 4:04 AM, Paul Rubin wrote: >> I realize the folks who designed and implemented PEP 393 are very smart >> cookies and considered stuff carefully, while I'm just an internet user >> posting an immediate impression of something I hadn't seen before (I >> still use Python 2.6), but I still have to ask: if the 393 approach >> makes sense, why don't other languages do it? > > Python has often copied or borrowed, with adjustments. This time it is the > first. We will see how it goes, but it has been tested for nearly a year > already.
Maybe it wasn't consciously borrowed, but whatever innovation is done, there's usually an obscure beardless language that did it earlier. :) Pike has a single string type, which can use the full Unicode range. If all codepoints are <256, the string width is 8 (measured in bits); if <65536, width is 16; otherwise 32. Using the inbuilt count_memory function (similar to the Python function used somewhere earlier in this thread, but which I can't at present put my finger to), I find that for strings of 16 bytes or more, there's a fixed 20-byte header plus the string content, stored in the correct number of bytes. (Pike strings, like Python ones, are immutable and do not need expansion room.) However, Python goes a bit further by making it VERY clear that this is a mere optimization, and that Unicode strings and bytes strings are completely different beasts. In Pike, it's possible to forget to encode something before (say) writing it to a socket. Everything works fine while you have only ASCII characters in the string, and then breaks when you have a >255 codepoint - or perhaps worse, when you have a 127<x<256, and the other end misinterprets it. Really, the only viable alternative to PEP 393 is a fixed 32-bit representation - it's the only way that's guaranteed to provide equivalent semantics. The new storage format is guaranteed to take no more memory than that, and provide equivalent functionality. ChrisA -- http://mail.python.org/mailman/listinfo/python-list