duikboot a écrit : > Hello, > > I am trying to extract a list of strings from a text. I am looking it > for hours now, googling didn't help either. > Could you please help me? > >>>> s = """ >>>> \n<organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>\n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie>""" >>>> regex = re.compile(r'<organisatie.*</organisatie>', re.S) >>>> L = regex.findall(s) >>>> print L > ['organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie> > \n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie'] > > I expected: > [('organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie> > \n<organisatie>), (<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</ > organisatie')] > > I must be missing something very obvious.
wrt/ regexp, Jason gave you the answer. Another point is that, when dealing with XML, it's sometime better to use an XML parser. Q&D : >>> from xml.etree import ElementTree as ET >>> s = "<root>" + s + "</root>" >>> tree = ET.fromstring(s) >>> tree <Element root at b795b2ac> >>> tree.findall("organisatie/Profiel_Id") [<Element Profiel_Id at b795b32c>, <Element Profiel_Id at b795b3ec>] >>> _[0].text '28996' >>> [it.text for it in tree.findall("organisatie/Profiel_Id")] ['28996', '28997'] >>> HTH -- http://mail.python.org/mailman/listinfo/python-list