On Mon, Aug 20, 2012 at 3:34 AM, Terry Reedy <tjre...@udel.edu> wrote:
> On 8/19/2012 4:04 AM, Paul Rubin wrote:
>> I realize the folks who designed and implemented PEP 393 are very smart
>> cookies and considered stuff carefully, while I'm just an internet user
>> posting an immediate impression of something I hadn't seen before (I
>> still use Python 2.6), but I still have to ask: if the 393 approach
>> makes sense, why don't other languages do it?
>
> Python has often copied or borrowed, with adjustments. This time it is the
> first. We will see how it goes, but it has been tested for nearly a year
> already.

Maybe it wasn't consciously borrowed, but whatever innovation is done,
there's usually an obscure beardless language that did it earlier. :)

Pike has a single string type, which can use the full Unicode range.
If all codepoints are <256, the string width is 8 (measured in bits);
if <65536, width is 16; otherwise 32. Using the inbuilt count_memory
function (similar to the Python function used somewhere earlier in
this thread, but which I can't at present put my finger to), I find
that for strings of 16 bytes or more, there's a fixed 20-byte header
plus the string content, stored in the correct number of bytes. (Pike
strings, like Python ones, are immutable and do not need expansion
room.)

However, Python goes a bit further by making it VERY clear that this
is a mere optimization, and that Unicode strings and bytes strings are
completely different beasts. In Pike, it's possible to forget to
encode something before (say) writing it to a socket. Everything works
fine while you have only ASCII characters in the string, and then
breaks when you have a >255 codepoint - or perhaps worse, when you
have a 127<x<256, and the other end misinterprets it.

Really, the only viable alternative to PEP 393 is a fixed 32-bit
representation - it's the only way that's guaranteed to provide
equivalent semantics. The new storage format is guaranteed to take no
more memory than that, and provide equivalent functionality.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to