On Dec 13, 5:49 pm, Sean DiZazzo <[EMAIL PROTECTED]> wrote: > Hi group, > > I'm wrapping up a command line util that returns xml in Python. The > util is flaky, and gives me back poorly formed xml with different > problems in different cases. Anyway I'm making progress. I'm not > very good at regular expressions though and was wondering if someone > could help with initially splitting the tags from the stdout returned > from the util. > > I have the following example string, and am simply trying to split it > into two xml tags... > > simplified = """2007-12-13 <tag1 attr1="text1" attr2="text2" /tag1> > \n2007-12-13 <tag2 attr1="text1" attr2="text2" attr3="text3\n" /tag2> > \n""" > > Basically I want the two tags, and to discard anything in between > using a reg exp. Like this: > > tags = ["<tag1 attr1="text1" attr2="text2" /tag1>", "<tag2 > attr1="text1" attr2="text2" attr3="text3\n" /tag2>"] > > I've tried several approaches, some of which got close, but the > newline in the middle of one of the tags screwed it up. The closest > I've been is something like this: > > retag = re.compile(r'<.+>*') # tried here with re.DOTALL as well > tags = re.findall(retag) > > Can anyone help me? > > ~Sean
I found something that works, although I couldn't tell you why it works. :) retag = re.compile(r'<.+?>', re.DOTALL) tags = retag.findall(retag) Why does that work? ~Sean -- http://mail.python.org/mailman/listinfo/python-list