Hi,
I am trying to understand how cElementTree's clear works: I have a (relatively) large XML file, that I do not wish to load into memory. So, naturally, I tried something like this: from cElementTree import iterparse for event, elem in iterparse("data.xml"): if elem.tag == "schnappi": count += 1 elem.clear() ... which resulted in caching of all elements in memory except for those named <schnappi> (i.e. the process' memory footprint grew more and more). Then I though about clear()'ing all elements that I did not really need: from cElementTree import iterparse for event, elem in iterparse("data.xml"): if elem.tag == "schnappi": count += 1 elem.clear() ... which gave a suitably small memory footprint, *BUT* since <schnappi> has a number of subelements, and I subscribe to 'end'-events, the <schnappi> element is returned after all of its subelements have been read and clear()'ed. So, I see indeed a <schnappi> element, but calling its getiterator() gives me completely empty subelements, which is not what I wanted :( Finally, I thought about keeping track of when to clear and when not to by subscribing to start and end elements (so that I would collect the entire <schnappi>-subtree in memory and only than release it): from cElementTree import iterparse clear_flag = True for event, elem in iterparse("data.xml", ("start", "end")): if event == "start" and elem.tag == "schnappi": # start collecting elements clear_flag = False if event == "end" and elem.tag == "schnappi": clear_flag = True # do something with elem # unless we are collecting elements, clear() if clear_flag: elem.clear() This gave me the desired behaviour, but: * It looks *very* ugly * It's twice as slow as version which sees 'end'-events only. Now, there *has* to be a better way. What am I missing? Thanks in advance, ivr -- "...but it's HDTV -- it's got a better resolution than the real world." -- Fry, "When aliens attack" -- http://mail.python.org/mailman/listinfo/python-list