In <mailman.988.1300289897.1189.python-l...@python.org> Amit Dev <amit...@gmail.com> writes:
> I'm observing a strange memory usage pattern with strings. Consider > the following session. Idea is to create a list which holds some > strings so that cumulative characters in the list is 100MB. > >>> l = [] > >>> for i in xrange(100000): > ... l.append(str(i) * (1000/len(str(i)))) > This uses around 100MB of memory as expected and 'del l' will clear that. > >>> for i in xrange(20000): > ... l.append(str(i) * (5000/len(str(i)))) > This is using 165MB of memory. I really don't understand where the > additional memory usage is coming from. > If I reduce the string size, it remains high till it reaches around > 1000. In that case it is back to 100MB usage. I don't know anything about the internals of python storage -- overhead, possible merging of like strings, etc. but some simple character counting shows that these two loops do not produce the same number of characters. The first loop produces: Ten single-digit values of i which are repeated 1000 times for a total of 10000 characters; Ninety two-digit values of i which are repeated 500 times for a total of 45000 characters; Nine hundred three-digit values of i which are repeated 333 times for a total of 299700 characters; Nine thousand four-digit values of i which are repeated 250 times for a total of 2250000 characters; Ninety thousand five-digit values of i which are repeated 200 times for a total of 18000000 characters. All that adds up to a grand total of 20604700 characters. Or, to condense the above long-winded text in table form: range num digits 1000/len(str(i)) total chars 0-9 10 1 1000 10000 10-99 90 2 500 45000 100-999 900 3 333 299700 1000-9999 9000 4 250 2250000 10000-99999 90000 5 200 18000000 ======== grand total chars 20604700 The second loop yields this table: range num digits 5000/len(str(i)) total bytes 0-9 10 1 5000 50000 10-99 90 2 2500 225000 100-999 900 3 1666 1499400 1000-9999 9000 4 1250 11250000 10000-19999 10000 5 1000 10000000 ======== grand total chars 23024400 The two loops do not produce the same numbers of characters, so I'm not surprised they do not consume the same amount of storage. P.S.: Please forgive me if I've made some basic math error somewhere. -- John Gordon A is for Amy, who fell down the stairs gor...@panix.com B is for Basil, assaulted by bears -- Edward Gorey, "The Gashlycrumb Tinies" -- http://mail.python.org/mailman/listinfo/python-list