Re: 10GB XML Blows out Memory, Suggestions?

Fredrik Lundh Wed, 07 Jun 2006 09:33:25 -0700

fuzzylollipop wrote:

> dependes on the CODE and the SIZE of the file, in this case
> processing 10GB of file, unless that file is heavly encrypted or
> compressed will, the process will be IO bound PERIOD!


so the fact that

     for token, node in pulldom.parse(file):
         pass

is 50-200% slower than

     for event, elem in ET.iterparse(file):
          if elem.tag == "item":
              elem.clear()

when reading a gigabyte-sized XML file, is due to an unexpected slowdown 
in the I/O subsystem after importing xml.dom?

> I work with TeraBytes of files, and all our Python code is just as fast
> as equivelent C code for IO bound processes.

so how large are the things that you're actually *processing* in your 
Python code?  megabyte blobs or 100-1000 byte records?  or even smaller 
things?

</F>

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: 10GB XML Blows out Memory, Suggestions?

Reply via email to