"Philip Semanchuk" <phi...@semanchuk.com> wrote in message news:mailman.7530.1232375454.3487.python-l...@python.org... > > On Jan 19, 2009, at 3:12 AM, S.Selvam Siva wrote: > >> Hi all, >> >> I am running a python script which parses nearly 22,000 html files >> locally >> stored using BeautifulSoup. >> The problem is the memory usage linearly increases as the files are >> being >> parsed. >> When the script has crossed parsing 200 files or so, it consumes all the >> available RAM and The CPU usage comes down to 0% (may be due to >> excessive >> paging). >> >> We tried 'del soup_object' and used 'gc.collect()'. But, no >> improvement. >> >> Please guide me how to limit python's memory-usage or proper method for >> handling BeautifulSoup object in resource effective manner > > You need to figure out where the memory is disappearing. Try commenting > out parts of your script. For instance, maybe start with a minimalist > script: open and close the files but don't process them. See if the > memory usage continues to be a problem. Then add elements back in, making > your minimalist script more and more like the real one. If the extreme > memory usage problem is isolated to one component or section, you'll find > it this way. > > HTH > Philip
Also, are you creating a separate soup object for each file or reusing one object over and over? --Tim -- http://mail.python.org/mailman/listinfo/python-list