I am thinking of calling a new subprocess which will do the memory hungry job and then release the memory as specified in the link below
http://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-mem ory-in-python/1316799#1316799 Regards, Alok -----Original Message----- From: Dave Angel [mailto:d...@davea.name] Sent: Monday, September 17, 2012 10:13 AM To: Jadhav, Alok Cc: python-list@python.org Subject: Re: Python garbage collector/memory manager behaving strangely On 09/16/2012 09:07 PM, Jadhav, Alok wrote: > Hi Everyone, > > > > I have a simple program which reads a large file containing few million > rows, parses each row (`numpy array`) and converts into an array of > doubles (`python array`) and later writes into an `hdf5 file`. I repeat > this loop for multiple days. After reading each file, i delete all the > objects and call garbage collector. When I run the program, First day > is parsed without any error but on the second day i get `MemoryError`. I > monitored the memory usage of my program, during first day of parsing, > memory usage is around **1.5 GB**. When the first day parsing is > finished, memory usage goes down to **50 MB**. Now when 2nd day starts > and i try to read the lines from the file I get `MemoryError`. Following > is the output of the program. > > > > > > source file extracted at C:\rfadump\au\2012.08.07.txt > > parsing started > > current time: 2012-09-16 22:40:16.829000 > > 500000 lines parsed > > 1000000 lines parsed > > 1500000 lines parsed > > 2000000 lines parsed > > 2500000 lines parsed > > 3000000 lines parsed > > 3500000 lines parsed > > 4000000 lines parsed > > 4500000 lines parsed > > 5000000 lines parsed > > parsing done. > > end time is 2012-09-16 23:34:19.931000 > > total time elapsed 0:54:03.102000 > > repacking file > > done > > > s:\users\aaj\projects\pythonhf\rfadumptohdf.py(132)generateFiles() > > -> while single_date <= self.end_date: > > (Pdb) c > > *** 2012-08-08 *** > > source file extracted at C:\rfadump\au\2012.08.08.txt > > cought an exception while generating file for day 2012-08-08. > > Traceback (most recent call last): > > File "rfaDumpToHDF.py", line 175, in generateFile > > lines = self.rawfile.read().split('|\n') > > MemoryError > > > > I am very sure that windows system task manager shows the memory usage > as **50 MB** for this process. It looks like the garbage collector or > memory manager for Python is not calculating the free memory correctly. > There should be lot of free memory but it thinks there is not enough. > > > > Any idea? > > > > Thanks. > > > > > > Alok Jadhav > > CREDIT SUISSE AG > > GAT IT Hong Kong, KVAG 67 > > International Commerce Centre | Hong Kong | Hong Kong > > Phone +852 2101 6274 | Mobile +852 9169 7172 > > alok.jad...@credit-suisse.com | www.credit-suisse.com > <http://www.credit-suisse.com/> > > > Don't blame CPython. You're trying to do a read() of a large file, which will result in a single large string. Then you split it into lines. Why not just read it in as lines, in which case the large string isn't necessary. Take a look at the readlines() function. Chances are that even that is unnecessary, but i can't tell without seeing more of the code. lines = self.rawfile.read().split('|\n') lines = self.rawfile.readlines() When a single large item is being allocated, it's not enough to have sufficient free space, the space also has to be contiguous. After a program runs for a while, its space naturally gets fragmented more and more. it's the nature of the C runtime, and CPython is stuck with it. -- DaveA =============================================================================== Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html =============================================================================== -- http://mail.python.org/mailman/listinfo/python-list