On Thu, 11 Jul 2013 11:42:26 -0700, wxjmfauth wrote: > And what to say about this "ucs4" char/string '\U0001d11e' which is > weighting 18 bytes more than an "a". > >>>> sys.getsizeof('\U0001d11e') > 44 > > A total absurdity.
You should stick to Python 3.1 and 3.2 then: py> print(sys.version) 3.1.3 (r313:86834, Nov 28 2010, 11:28:10) [GCC 4.4.5] py> sys.getsizeof('\U0001d11e') 36 py> sys.getsizeof('a') 36 Now all your strings will be just as heavy, every single variable name and attribute name will use four times as much memory. Happy now? > How does is come? Very simple, once you split Unicode > in subsets, not only you have to handle these subsets, you have to > create "markers" to differentiate them. Not only, you produce "markers", > you have to handle the mess generated by these "markers". Hiding this > markers in the everhead of the class does not mean that they should not > be counted as part of the coding scheme. BTW, since when a serious > coding scheme need an extermal marker? Since always. How do you think that (say) a C compiler can tell the difference between the long 1199876496 and the float 67923.125? They both have exactly the same four bytes: py> import struct py> struct.pack('f', 67923.125) b'\x90\xa9\x84G' py> struct.pack('l', 1199876496) b'\x90\xa9\x84G' *Everything* in a computer is bytes. The only way to tell them apart is by external markers. -- Steven -- http://mail.python.org/mailman/listinfo/python-list