On Fri, 08 Nov 2013 12:43:43 -0800, wxjmfauth wrote: > "(say, 1 kbyte each)": one "kilo" of characters or bytes? > > Glad to read some users are still living in an ascii world, at the > "Unicode time" where an encoded code point size may vary between 1-4 > bytes. > > > Oops, sorry, I'm wrong,
That part is true. > it can be much more. That part is false. You're measuring the overhead of the object structure, not the per-character storage. This has been the case going back since at least Python 2.2: strings are objects, and have overhead. >>>> sys.getsizeof('ab') > 27 27 bytes for two characters! Except it isn't, it's actually 25 bytes for the object header and two bytes for the two characters. >>>> sys.getsizeof('a\U0001d11e') > 48 And here you have four bytes each for the two characters and a 40 byte header. Observe: py> c = '\U0001d11e' py> len(c) 1 py> sys.getsizeof(2*c) - sys.getsizeof(c) 4 py> sys.getsizeof(1000*c) - sys.getsizeof(999*c) 4 How big is the object overhead on a (say) thousand character string? Just one percent: py> (sys.getsizeof(1000*c) - 4000)/4000 0.01 -- Steven -- https://mail.python.org/mailman/listinfo/python-list