Igor V. Rafienko wrote: > Finally, I thought about keeping track of when to clear and when not > to by subscribing to start and end elements (so that I would collect > the entire <schnappi>-subtree in memory and only than release it): > > from cElementTree import iterparse > clear_flag = True > for event, elem in iterparse("data.xml", ("start", "end")): > if event == "start" and elem.tag == "schnappi": > # start collecting elements > clear_flag = False > if event == "end" and elem.tag == "schnappi": > clear_flag = True > # do something with elem > # unless we are collecting elements, clear() > if clear_flag: > elem.clear() > > This gave me the desired behaviour, but: > > * It looks *very* ugly > * It's twice as slow as version which sees 'end'-events only. > > Now, there *has* to be a better way. What am I missing?
the iterparse/clear approach works best if your XML file has a record-like structure. if you have toplevel records with lots of schnappi records in them, iterate over the records and use find (etc) to locate the subrecords you're interested in: for event, elem in iterparse("data.xml"): if event.tag == "record": # deal with schnappi subrecords for schappi in elem.findall(".//schnappi"): process(schnappi) elem.clear() the collect flag approach isn't that bad ("twice as slow" doesn't really say much: "raw" cElementTree is extremely fast compared to the Python interpreter, so everything you end up doing in Python will slow things down quite a bit). to make your application code look a bit less convoluted, put the logic in a generator function: # in library def process(filename, annoying_animal): clear = True start = "start"; end = "end" for event, elem in iterparse(filename, (start, end)): if elem.tag == annoying_animal: if event is start: clear = False else: yield elem clear = True if clear: elem.clear() # in application for subelem in process(filename, "schnappi"): # do something with subelem (I've reorganized the code a bit to cut down on the operations. also note the "is" trick; iterparse returns the event strings you pass in, so comparing on object identities is safe) an alternative is to use the lower-level XMLParser class (which is similar to SAX, but faster), but that will most likely result in more and tricker Python code... </F> -- http://mail.python.org/mailman/listinfo/python-list