fuzzylollipop wrote: > dependes on the CODE and the SIZE of the file, in this case > processing 10GB of file, unless that file is heavly encrypted or > compressed will, the process will be IO bound PERIOD!
so the fact that for token, node in pulldom.parse(file): pass is 50-200% slower than for event, elem in ET.iterparse(file): if elem.tag == "item": elem.clear() when reading a gigabyte-sized XML file, is due to an unexpected slowdown in the I/O subsystem after importing xml.dom? > I work with TeraBytes of files, and all our Python code is just as fast > as equivelent C code for IO bound processes. so how large are the things that you're actually *processing* in your Python code? megabyte blobs or 100-1000 byte records? or even smaller things? </F> -- http://mail.python.org/mailman/listinfo/python-list