Re: Python parsing XML file problem with SAX

Aahz Tue, 24 Aug 2010 08:43:06 -0700

In article <[email protected]>,
Stefan Behnel  <[email protected]> wrote:
>Christian Heimes, 10.08.2010 01:39:
>> Am 10.08.2010 01:20, schrieb Aahz:
>>> The docs say, "Parses an XML section into an element tree incrementally".
>>> Sure sounds like it retains the entire parsed tree in RAM.  Not good.
>>> Again, how do you parse an XML file larger than your available memory
>>> using something other than SAX?
>>
>> The document at
>> http://www.ibm.com/developerworks/xml/library/x-hiperfparse/ explains it
>> one way.
>>
>> The iterparser approach is ingenious but it doesn't work for every XML
>> format. Let's say you have a 10 GB XML file with one million<part/>
>> tags. An iterparser doesn't load the entire document. Instead it
>> iterates over the file and yields (for example) one million ElementTrees
>> for each<part/>  tag and its children. You can get the nice API of
>> ElementTree with the memory efficiency of a SAX parser if you obey
>> "Listing 4".
>
>In the very common case that you are interested in all children of the root 
>element, it's even enough to intercept on the specific tag name (lxml.etree 
>has an option for that, but an 'if' block will do just fine in ET) and just 
>".clear()" the child element at the end of the loop body. That results in 
>very fast and simple code, but will leave the tags in the tree while only 
>removing their content and attributes. Usually works well enough for 
>several ten thousand elements, especially when using cElementTree.


Thanks to both of you!
-- 
Aahz ([email protected])           <*>         http://www.pythoncraft.com/

"...if I were on life-support, I'd rather have it run by a Gameboy than a
Windows box."  --Cliff Wells
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python parsing XML file problem with SAX

Reply via email to