Hi group, I'm wrapping up a command line util that returns xml in Python. The util is flaky, and gives me back poorly formed xml with different problems in different cases. Anyway I'm making progress. I'm not very good at regular expressions though and was wondering if someone could help with initially splitting the tags from the stdout returned from the util.
I have the following example string, and am simply trying to split it into two xml tags... simplified = """2007-12-13 <tag1 attr1="text1" attr2="text2" /tag1> \n2007-12-13 <tag2 attr1="text1" attr2="text2" attr3="text3\n" /tag2> \n""" Basically I want the two tags, and to discard anything in between using a reg exp. Like this: tags = ["<tag1 attr1="text1" attr2="text2" /tag1>", "<tag2 attr1="text1" attr2="text2" attr3="text3\n" /tag2>"] I've tried several approaches, some of which got close, but the newline in the middle of one of the tags screwed it up. The closest I've been is something like this: retag = re.compile(r'<.+>*') # tried here with re.DOTALL as well tags = re.findall(retag) Can anyone help me? ~Sean -- http://mail.python.org/mailman/listinfo/python-list