kj wrote: > I want to write code that parses a file that is far bigger than > the amount of memory I can count on. Therefore, I want to stay as > far away as possible from anything that produces a memory-resident > DOM tree. > > The top-level structure of this xml is very simple: it's just a > very long list of "records". All the complexity of the data is at > the level of the individual records, but these records are tiny in > size (relative to the size of the entire file). > > So the ideal would be a "parser-iterator", which parses just enough > of the file to "yield" (in the generator sense) the next record, > thereby returning control to the caller; the caller can process > the record, delete it from memory, and return control to the > parser-iterator; once parser-iterator regains control, it repeats > this sequence starting where it left off.
How about http://effbot.org/zone/element-iterparse.htm#incremental-parsing Peter -- http://mail.python.org/mailman/listinfo/python-list