Le samedi 8 février 2014 03:48:12 UTC+1, Steven D'Aprano a écrit : > > > We consider it A GOOD THING that Python spends memory for programmer > > convenience and safety. Python looks for memory optimizations when it can > > save large amounts of memory, not utterly trivial amounts. So in a Python > > wide build, a ten-thousand block character string requires a little bit > > more than 40KB. In Python 3.3, that can be reduced to only 10KB for a > > purely Latin-1 string, or 20K for a string without any astral characters. > > That's the sort of memory savings that are worthwhile, reducing memory > > usage by 75%. > > >
In its attempt to save memory, Python only succeeds to do worse than any utf* coding schemes. --- Python does not save memory at all. A str (unicode string) uses less memory only - and only - because and when one uses explicitly characters which are consuming less memory. Not only the memory gain is zero, Python falls back to the worse case. >>> sys.getsizeof('a' * 1000000) 1000025 >>> sys.getsizeof('a' * 1000000 + 'oe') 2000040 >>> sys.getsizeof('a' * 1000000 + 'oe' + '\U00010000') 4000048 The opposite of what the utf8/utf16 do! >>> sys.getsizeof(('a' * 1000000 + 'oe' + '\U00010000').encode('utf-8')) 1000023 >>> sys.getsizeof(('a' * 1000000 + 'oe' + '\U00010000').encode('utf-16')) 2000025 jmf -- https://mail.python.org/mailman/listinfo/python-list