Re: chunking a long string?

Steven D'Aprano Fri, 08 Nov 2013 16:52:35 -0800

On Fri, 08 Nov 2013 12:43:43 -0800, wxjmfauth wrote:

> "(say, 1 kbyte each)": one "kilo" of characters or bytes?
> 
> Glad to read some users are still living in an ascii world, at the
> "Unicode time" where an encoded code point size may vary between 1-4
> bytes.
> 
> 
> Oops, sorry, I'm wrong,


That part is true.


> it can be much more.

That part is false. You're measuring the overhead of the object 
structure, not the per-character storage. This has been the case going 
back since at least Python 2.2: strings are objects, and have overhead.

>>>> sys.getsizeof('ab')
> 27

27 bytes for two characters! Except it isn't, it's actually 25 bytes for 
the object header and two bytes for the two characters.

>>>> sys.getsizeof('a\U0001d11e')
> 48

And here you have four bytes each for the two characters and a 40 byte 
header. Observe:

py> c = '\U0001d11e'
py> len(c)
1
py> sys.getsizeof(2*c) - sys.getsizeof(c)
4
py> sys.getsizeof(1000*c) - sys.getsizeof(999*c)
4


How big is the object overhead on a (say) thousand character string? Just 
one percent:

py> (sys.getsizeof(1000*c) - 4000)/4000
0.01



-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: chunking a long string?

Reply via email to