[EMAIL PROTECTED] schrieb: > Hi all, > > i have an XML file with the following structure:: > > <r1> > <r2>-----| > <r3> | > <r4> | > . | > . | --------------------> constitutes one record. > . | > . | > . | > </r4> | > </r3> | > </r2>----| > <r2> > . > . > . -----------------------| > . | > . | > . |----------------------> there are n > records in between.... > . | > . | > . | > . ------------------------| > . > . > </r2> > <r2>-----| > <r3> | > <r4> | > . | > . | --------------------> constitutes one record. > . | > . | > . | > </r4> | > </r3> | > </r2>----| > </r1> > > > Here <r1> is the main root tag of the XML, and <r2>...</r2> > constitutes one record. What I would like to do is > to extract everything (xml tags and data) between nth <r2> tag and (n > +k)th <r2> tag. The extracted data is to be > written down to a separate file.
What do you mean by "written down to a separate file"? Do you have a specific format in mind? In general, you can try this: >>> from xml.etree import cElementTree as ET >>> itercontext = ET.iterparse("thefile.xml", events=("start", "end") >>> event,root = itercontext.next() >>> for event,element in itercontext: ... if event == "end" and element.tag == "r2": ... print ET.tostring(element) # write record subtree as XML ... root.clear() # one record done, clean up everything http://effbot.org/zone/element-iterparse.htm You can also do things like ... print element.findtext("r3/r4") Read the ElementTree tutorial to learn how to extract your data: http://effbot.org/zone/element.htm#searching-for-subelements Stefan -- http://mail.python.org/mailman/listinfo/python-list