Re: Finding size of Variable

Chris Angelico Wed, 05 Feb 2014 03:47:22 -0800

On Wed, Feb 5, 2014 at 10:00 PM, Steven D'Aprano
<[email protected]> wrote:
>> where stopWords.txt is a file of size 4KB
>
> My guess is that if you split a 4K file into words, then put the words
> into a list, you'll probably end up with 6-8K in memory.


I'd guess rather more; Python strings have a fair bit of fixed
overhead, so with a whole lot of small strings, it will get more
costly.

>>> sys.version
'3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan  5 2014, 16:23:43) [MSC v.1600 32
bit (Intel)]'
>>> sys.getsizeof("asdf")
29

"Stop words" tend to be short, rather than long, words, so I'd look at
an average of 2-3 letters per word. Assuming they're separated by
spaces or newlines, that means there'll be roughly a thousand of them
in the file, for about 25K of overhead. A bit less if the words are
longer, but still quite a bit. (Byte strings have slightly less
overhead, 17 bytes apiece, but still quite a bit.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Finding size of Variable

Reply via email to