?? Python 2.7.1 (r271:86832, Nov 27 2010, 17:19:03) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> L = [] >>> for i in xrange(100000): ... L.append(str(i) * (1000 / len(str(i)))) ... >>> sys.getsizeof(L) 824464 >>> L = [] >>> for i in xrange(20000): ... L.append(str(i) * (5000 / len(str(i)))) ... >>> sys.getsizeof(L) 178024 >>>
~/santa On Wed, Mar 16, 2011 at 11:20 AM, Amit Dev <amit...@gmail.com> wrote: > sum(map(len, l)) => 99998200 for 1st case and 99999100 for 2nd case. > Roughly 100MB as I mentioned. > > On Wed, Mar 16, 2011 at 11:21 PM, John Gordon <gor...@panix.com> wrote: > > In <mailman.988.1300289897.1189.python-l...@python.org> Amit Dev < > amit...@gmail.com> writes: > > > >> I'm observing a strange memory usage pattern with strings. Consider > >> the following session. Idea is to create a list which holds some > >> strings so that cumulative characters in the list is 100MB. > > > >> >>> l = [] > >> >>> for i in xrange(100000): > >> ... l.append(str(i) * (1000/len(str(i)))) > > > >> This uses around 100MB of memory as expected and 'del l' will clear > that. > > > >> >>> for i in xrange(20000): > >> ... l.append(str(i) * (5000/len(str(i)))) > > > >> This is using 165MB of memory. I really don't understand where the > >> additional memory usage is coming from. > > > >> If I reduce the string size, it remains high till it reaches around > >> 1000. In that case it is back to 100MB usage. > > > > I don't know anything about the internals of python storage -- overhead, > > possible merging of like strings, etc. but some simple character > counting > > shows that these two loops do not produce the same number of characters. > > > > The first loop produces: > > > > Ten single-digit values of i which are repeated 1000 times for a total of > > 10000 characters; > > > > Ninety two-digit values of i which are repeated 500 times for a total of > > 45000 characters; > > > > Nine hundred three-digit values of i which are repeated 333 times for a > > total of 299700 characters; > > > > Nine thousand four-digit values of i which are repeated 250 times for a > > total of 2250000 characters; > > > > Ninety thousand five-digit values of i which are repeated 200 times for > > a total of 18000000 characters. > > > > All that adds up to a grand total of 20604700 characters. > > > > Or, to condense the above long-winded text in table form: > > > > range num digits 1000/len(str(i)) total chars > > 0-9 10 1 1000 10000 > > 10-99 90 2 500 45000 > > 100-999 900 3 333 299700 > > 1000-9999 9000 4 250 2250000 > > 10000-99999 90000 5 200 18000000 > > ======== > > grand total chars 20604700 > > > > The second loop yields this table: > > > > range num digits 5000/len(str(i)) total bytes > > 0-9 10 1 5000 50000 > > 10-99 90 2 2500 225000 > > 100-999 900 3 1666 1499400 > > 1000-9999 9000 4 1250 11250000 > > 10000-19999 10000 5 1000 10000000 > > ======== > > grand total chars 23024400 > > > > The two loops do not produce the same numbers of characters, so I'm not > > surprised they do not consume the same amount of storage. > > > > P.S.: Please forgive me if I've made some basic math error somewhere. > > > > -- > > John Gordon A is for Amy, who fell down the stairs > > gor...@panix.com B is for Basil, assaulted by bears > > -- Edward Gorey, "The Gashlycrumb Tinies" > > > > -- > > http://mail.python.org/mailman/listinfo/python-list > > > -- > http://mail.python.org/mailman/listinfo/python-list >
-- http://mail.python.org/mailman/listinfo/python-list