In article <mailman.1895.1281422126.1673.python-l...@python.org>, Stefan Behnel <stefan...@behnel.de> wrote: >Christian Heimes, 10.08.2010 01:39: >> Am 10.08.2010 01:20, schrieb Aahz: >>> The docs say, "Parses an XML section into an element tree incrementally". >>> Sure sounds like it retains the entire parsed tree in RAM. Not good. >>> Again, how do you parse an XML file larger than your available memory >>> using something other than SAX? >> >> The document at >> http://www.ibm.com/developerworks/xml/library/x-hiperfparse/ explains it >> one way. >> >> The iterparser approach is ingenious but it doesn't work for every XML >> format. Let's say you have a 10 GB XML file with one million<part/> >> tags. An iterparser doesn't load the entire document. Instead it >> iterates over the file and yields (for example) one million ElementTrees >> for each<part/> tag and its children. You can get the nice API of >> ElementTree with the memory efficiency of a SAX parser if you obey >> "Listing 4". > >In the very common case that you are interested in all children of the root >element, it's even enough to intercept on the specific tag name (lxml.etree >has an option for that, but an 'if' block will do just fine in ET) and just >".clear()" the child element at the end of the loop body. That results in >very fast and simple code, but will leave the tags in the tree while only >removing their content and attributes. Usually works well enough for >several ten thousand elements, especially when using cElementTree.
Thanks to both of you! -- Aahz (a...@pythoncraft.com) <*> http://www.pythoncraft.com/ "...if I were on life-support, I'd rather have it run by a Gameboy than a Windows box." --Cliff Wells -- http://mail.python.org/mailman/listinfo/python-list