duikboot wrote: > Hello, > > I am trying to extract a list of strings from a text. I am looking it > for hours now, googling didn't help either. > Could you please help me? > >>>>s = """ >>>>\n<organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>\n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie>""" >>>> regex = re.compile(r'<organisatie.*</organisatie>', re.S) >>>> L = regex.findall(s) >>>> print L > ['organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie> > \n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie'] > > I expected: > [('organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie> > \n<organisatie>), (<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</ > organisatie')] > > I must be missing something very obvious.
Don't use regular expressions to process XML. It's not the right tool for the job, and even if simple cases as yours often can made work initially, the longer you work with it, the more complex and troublesome the code gets. Instead, use the right tool, for example lxml. That has e.g. XPath-expressions build in, that do the job: from lxml import etree tree = etree.fromstring("""<root><organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>\n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie></root>""") for feld in tree.xpath('//organisatie/Profiel_Id'): print feld.text Diez -- http://mail.python.org/mailman/listinfo/python-list