You're welcome! Also, of course, parsing XML is a very common task and you might be interested in using one of the standard modules for that, e.g. http://docs.python.org/lib/module-xml.parsers.expat.html
Then all the tricky parsing work has been done for you. Jason On Sep 17, 9:31 am, duikboot <[EMAIL PROTECTED]> wrote: > Thank you very much, it works. I guess I didn't read it right. > > Arjen > > On Sep 17, 3:22 pm, Jason Drew <[EMAIL PROTECTED]> wrote: > > > You just need a one-character addition to your regex: > > > regex = re.compile(r'<organisatie.*?</organisatie>', re.S) > > > Note, there is now a question mark (?) after the .* > > > By default, regular expressions are "greedy" and will grab as much > > text as possible when making a match. So your original expression was > > grabbing everything between the first opening tag and the last closing > > tag. The question mark says, don't be greedy, and you get the > > behaviour you need. > > > This is covered in the documentation for the re > > module.http://docs.python.org/lib/module-re.html > > > Jason > > > On Sep 17, 9:00 am, duikboot <[EMAIL PROTECTED]> wrote: > > > > Hello, > > > > I am trying to extract a list of strings from a text. I am looking it > > > for hours now, googling didn't help either. > > > Could you please help me? > > > > >>>s = """ > > > >>>\n<organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie>\n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie>""" > > > >>> regex = re.compile(r'<organisatie.*</organisatie>', re.S) > > > >>> L = regex.findall(s) > > > >>> print L > > > > ['organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie> > > > \n<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</organisatie'] > > > > I expected: > > > [('organisatie>\n<Profiel_Id>28996</Profiel_Id>\n</organisatie> > > > \n<organisatie>), (<organisatie>\n<Profiel_Id>28997</Profiel_Id>\n</ > > > organisatie')] > > > > I must be missing something very obvious. > > > > Greetings Arjen -- http://mail.python.org/mailman/listinfo/python-list