On Mon, Sep 9, 2013, at 10:28, wxjmfa...@gmail.com wrote: *time performance differences* > > Comment: Such differences never happen with utf.
Why is this bad? Keeping in mind that otherwise they would all be almost as slow as the UCS-4 case. > >>> sys.getsizeof('a') > 26 > >>> sys.getsizeof('€') > 40 > >>> sys.getsizeof('\U0001d11e') > 44 > > Comment: 18 bytes more than latin-1 > > Comment: With utf, a char (in string or not) never exceed 4 A string is an object and needs to store the length, along with any overhead relating to object headers. I believe there is also an appended null character. Also, ASCII strings are stored differently from Latin-1 strings. >>> sys.getsizeof('a'*999) 1048 = 49 bytes overhead, 1 byte per character. >>> sys.getsizeof('\xa4'*999) 1072 = 74 bytes overhead, 1 byte per character. >>> sys.getsizeof('\u20ac'*999) 2072 = 76 bytes overhead, 2 bytes per character. >>> sys.getsizeof('\U0001d11e'*999) 4072 = 80 bytes overhead, 4 bytes per character. (I bet sys.getsizeof('\xa4') will return 38 on your system, so 44 is only six bytes more, not 18) If we did not have the FSR, everything would be 4 bytes per character. We might have less overhead, but a string only has to be 25 characters long before the savings from the shorter representation outweigh even having _no_ overhead, and every four bytes of overhead reduces that number by one. And you have a 32-bit python build, which has less overhead than mine - in yours, strings only have to be seven characters long for the FSR to be worth it. Assume the minimum possible overhead is two words for the object header, a size, and a pointer - i.e. sixteen bytes, compared to the 25 you've demonstrated for ASCII, and strings only need to be _two_ characters long for the FSR to be a better deal than always using UCS4 strings. The need for four-byte-per-character strings would not go away by eliminating the FSR, so you're basically saying that everything should be constrained to the worst-case performance scenario. -- https://mail.python.org/mailman/listinfo/python-list