On Mon, 2010-12-27 at 22:55 +0100, Stefan Behnel wrote: > Alan Meyer, 27.12.2010 21:40: > > On 12/21/2010 3:16 AM, Stefan Behnel wrote: > >> Adam Tauno Williams, 20.12.2010 20:49: > > ... > >>> You need to process the document as a stream of elements; aka SAX. > >> IMHO, this is the worst advice you can give. > > Why do you say that? I would have thought that using SAX in this > > application is an excellent idea. > From my experience, SAX is only practical for very simple cases where > little state is involved when extracting information from the parse events. > A typical example is gathering statistics based on single tags - not a very > common use case. Anything that involves knowing where in the XML tree you > are to figure out what to do with the event is already too complicated.
I've found that using a stack-model makes traversing complex documents with SAX quite manageable. For example, I parse BPML files with SAX. If the document is nested and context sensitive then I really don't see how iterparse differs all that much. > My serious advices is: don't waste your time learning SAX. It's simply too > frustrating to debug SAX extraction code into existence. Given how simple > and fast it is to extract data with ElementTree's iterparse() in a memory > efficient way, there is really no reason to write complicated SAX code > instead. -- http://mail.python.org/mailman/listinfo/python-list