On Sep 17, 9:00 am, duikboot <[EMAIL PROTECTED]> wrote: > Hello, > > I am trying to extract a list of strings from a text. I am looking it > for hours now, googling didn't help either. > Could you please help me? > > >>>s = """ > >>>\n<organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>\n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie>""" > >>> regex = re.compile(r'<organisatie.*</organisatie>', re.S) > >>> L = regex.findall(s) > >>> print L > > ['organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie> > \n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie'] > > I expected: > [('organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie> > \n<organisatie>), (<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</ > organisatie')] > > I must be missing something very obvious.
The less obvious thing that you're missing is that regular expressions is not the best solution to every text-related problem. Thinking at a higher level helps sometimes; for example here you don't want to extract "a list of strings from a text", you want to extract specific elements from an XML data source. There are several standard and non standard python packages for XML processing, look for them online. Here's how to do it using the (3rd party) BeautyfulSoup module: >>> from BeautifulSoup import BeautifulStoneSoup >>> BeautifulStoneSoup(s).findAll('organisatie') [<organisatie> <profiel_id>28996</profiel_id> </organisatie>, <organisatie> <profiel_id>28997</profiel_id> </organisatie>] HTH, George -- http://mail.python.org/mailman/listinfo/python-list