[EMAIL PROTECTED] wrote: > Hi all, > > i have an XML file with the following structure:: > > <r1> > <r2>-----| > <r3> | > <r4> | > . | > . | --------------------> constitutes one record. > . | > . | > . | > </r4> | > </r3> | > </r2>----| > <r2> > . > . > . -----------------------| > . | > . | > . |----------------------> there are n > records in between.... > . | > . | > . | > . ------------------------| > . > . > </r2> > <r2>-----| > <r3> | > <r4> | > . | > . | --------------------> constitutes one record. > . | > . | > . | > </r4> | > </r3> | > </r2>----| > </r1> > > > Here <r1> is the main root tag of the XML, and <r2>...</r2> > constitutes one record. What I would like to do is > to extract everything (xml tags and data) between nth <r2> tag and (n > +k)th <r2> tag. The extracted data is to be > written down to a separate file. > > Thanks...
You could create a generator expression out of it: txt = """<r1> <r2><r3><r4>1</r4></r3></r2> <r2><r3><r4>2</r4></r3></r2> <r2><r3><r4>3</r4></r3></r2> <r2><r3><r4>4</r4></r3></r2> <r2><r3><r4>5</r4></r3></r2> </r1> """ l = len(txt.split('r2>'))-1 a = ('<r2>%sr2>'%i for j,i in enumerate(txt.split('r2>')) if 0 < j < l and i.replace('>','').replace('<','').strip()) Now you have a generator you can iterate through with a.next() or alternatively you could just create a list out of it by replacing the outer parens with square brackets. -- http://mail.python.org/mailman/listinfo/python-list