Re: Python regex question

Gerhard Häring Wed, 11 Jun 2008 04:39:56 -0700

Tim van der Leeuw wrote:

Hi,
I'm trying to create a regular expression for matching some particularXML strings. I want to extract the contents of a particular XML tag,only if it follows one tag, but not follows another tag. Complicatingthis, is that there can be any number of other tags in between. [...]


Sounds like this would be easier to implement using Python's SAX API.

Here's a short example that does something similar to what you want toachieve:


import xml.sax

test_str = """
<xml>
<ignore/>
<foo x="1" y="2"/>
<noignore/>
<foo x="3" y="4"/>
</xml>
"""

class MyHandler(xml.sax.handler.ContentHandler):
    def __init__(self):
        xml.sax.handler.ContentHandler.__init__(self)
        self.ignore_next = False

    def startElement(self, name, attrs):
        if name == "ignore":
            self.ignore_next = True
            return
        elif name == "foo":
            if not self.ignore_next:
                # handle the element you're interested in here
                print "MY ELEMENT", name, "with", dict(attrs)

        self.ignore_next = False

xml.sax.parseString(test_str, MyHandler())

In this case, this looks much clearer and easier to understand to methan regular expressions.


-- Gerhard

--
http://mail.python.org/mailman/listinfo/python-list

Re: Python regex question

Reply via email to