Thanks Dan for the detailed reply. I suspect it is related to FreeBSD malloc/free as you suggested. Here is the output of running your script:
[16-bsd01 ~/work]$ python strm.py --first USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND amdev 6899 3.0 6.9 111944 107560 p0 S+ 9:57PM 0:01.20 python strm.py --first (python2.5) amdev 6900 0.0 0.1 3508 1424 p0 S+ 9:57PM 0:00.02 sh -c ps aux | egrep '\\<6899\\>|^USER\\>' amdev 6902 0.0 0.1 3380 1188 p0 S+ 9:57PM 0:00.01 egrep \\<6899\\>|^USER\\> [16-bsd01 ~/work]$ python strm.py --second USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND amdev 6903 0.0 10.5 166216 163992 p0 S+ 9:57PM 0:00.92 python strm.py --second (python2.5) amdev 6904 0.0 0.1 3508 1424 p0 S+ 9:57PM 0:00.02 sh -c ps aux | egrep '\\<6903\\>|^USER\\>' amdev 6906 0.0 0.1 3508 1424 p0 R+ 9:57PM 0:00.00 egrep \\<6903\\>|^USER\\> (sh) Regards, Amit On Thu, Mar 17, 2011 at 3:21 AM, Dan Stromberg <drsali...@gmail.com> wrote: > > On Wed, Mar 16, 2011 at 8:38 AM, Amit Dev <amit...@gmail.com> wrote: >> >> I'm observing a strange memory usage pattern with strings. Consider >> the following session. Idea is to create a list which holds some >> strings so that cumulative characters in the list is 100MB. >> >> >>> l = [] >> >>> for i in xrange(100000): >> ... l.append(str(i) * (1000/len(str(i)))) >> >> This uses around 100MB of memory as expected and 'del l' will clear that. >> >> >> >>> for i in xrange(20000): >> ... l.append(str(i) * (5000/len(str(i)))) >> >> This is using 165MB of memory. I really don't understand where the >> additional memory usage is coming from. >> >> If I reduce the string size, it remains high till it reaches around >> 1000. In that case it is back to 100MB usage. >> >> Python 2.6.4 on FreeBSD. >> >> Regards, >> Amit >> -- >> http://mail.python.org/mailman/listinfo/python-list > > On Python 2.6.6 on Ubuntu 10.10: > > $ cat pmu > #!/usr/bin/python > > import os > import sys > > list_ = [] > > if sys.argv[1] == '--first': > for i in xrange(100000): > list_.append(str(i) * (1000/len(str(i)))) > elif sys.argv[1] == '--second': > for i in xrange(20000): > list_.append(str(i) * (5000/len(str(i)))) > else: > sys.stderr.write('%s: Illegal sys.argv[1]\n' % sys.argv[0]) > sys.exit(1) > > os.system("ps aux | egrep '\<%d\>|^USER\>'" % os.getpid()) > > dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 - > above cmd done 2011 Wed Mar 16 02:38 PM > > $ make > ./pmu --first > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > 1000 11063 0.0 3.4 110212 104436 pts/5 S+ 14:38 0:00 > /usr/bin/python ./pmu --first > 1000 11064 0.0 0.0 1896 512 pts/5 S+ 14:38 0:00 sh -c ps > aux | egrep '\<11063\>|^USER\>' > 1000 11066 0.0 0.0 4012 740 pts/5 S+ 14:38 0:00 egrep > \<11063\>|^USER\> > ./pmu --second > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > 1000 11067 13.0 3.3 107540 101536 pts/5 S+ 14:38 0:00 > /usr/bin/python ./pmu --second > 1000 11068 0.0 0.0 1896 508 pts/5 S+ 14:38 0:00 sh -c ps > aux | egrep '\<11067\>|^USER\>' > 1000 11070 0.0 0.0 4012 740 pts/5 S+ 14:38 0:00 egrep > \<11067\>|^USER\> > dstromberg-laptop-dstromberg:~/src/python-mem-use i686-pc-linux-gnu 10916 - > above cmd done 2011 Wed Mar 16 02:38 PM > > So on Python 2.6.6 + Ubuntu 10.10, the second is actually a little smaller > than the first. > > Some issues you might ponder: > 1) Does FreeBSD's malloc/free know how to free unused memory pages in the > middle of the heap (using mmap games), or does it only sbrk() down when the > end of the heap becomes unused, or does it never sbrk() back down at all? > I've heard various *ix's fall into one of these 3 groups in releasing unused > pages. > > 2) It mijght be just an issue of how frequently the interpreter garbage > collects; you could try adjusting this; check out the gc module. Note that > it's often faster not to collect at every conceivable opportunity, but this > tends to add up the bytes pretty quickly in some scripts - for a while, > until the next collection. So your memory use pattern will often end up > looking like a bit of a sawtooth function. > > 3) If you need strict memory use guarantees, you might be better off with a > language that's closer to the metal, like C - something that isn't garbage > collected is one parameter to consider. If you already have something in > CPython, then Cython might help; Cython allows you to use C datastructures > from a dialect of Python. > > > -- http://mail.python.org/mailman/listinfo/python-list