Carbon Man wrote: > Very new to Python, running 2.5 on windows. > I am processing an XML file (7.2MB). Using the standard library I am > recursively processing each node and parsing it. The branches don't go > particularly deep. What is happening is that the program is running really > really slowly, so slow that even running it over night, it still doesn't > finish. > Stepping through it I have noticed that memory usage has shot up from 190MB > to 624MB and continues to climb.
That sounds indeed like a problem in the code. But even if the XML file is only 7.2 MB the XML structures and what you create out of them have some overhead. > If I set a break point and then stop the > program the memory is not released. It is not until I shutdown PythonWin > that the memory gets released. Then you're apparently looking at VSIZE or whatever it's called on Windows. It's the maximum memory the process ever allocated. And this usually *never* decreases, no matter what the application (Python or otherwise). > [GC experiments] Unless you have circular references, in my experience automatic garbage collection in Python works fine. I never had to mess with it myself in 10 years of Python usage. > If I have the program at a break and do gc.collect() it doesn't fix it, so > whatever referencing is causing problems is still active. > My program is parsing the XML and generating a Python program for > SQLalchemy, but the program never gets a chance to run the memory problem is > prior to that. It probably has something to do with the way I am string > building. Yes, you're apparently concatenating strings. A lot. Don't do that. At least not this way: s = "" s += "something" s += "else" instead do this: from cStringIO import StringIO s = StringIO() s.write("something") s.write("else") ... s.seek(0) print s.read() or lst = [] lst.append("something") lst.append("else") print "".join(lst) > My apologies for the long post but without being able to see the code I > doubt anyone can give me a solid answer so here it goes (sorry for the lack > of comments): [...] Code snipped. Two tips: Use one of the above methods for concatenating strings. This is a common problem in Python (and other languages, Java and C# also have StringBuilder classes because of this). If you want to speed up your XML processing, use the ElementTree module in the standard library. It's a lot easier to use and also faster than what you're using currently. A bonus is it can be swapped out for the even faster lxml module (externally available, not in the standard library) by changing a single import for another noticable performance improvement. HTH -- Gerhard -- http://mail.python.org/mailman/listinfo/python-list