On Dec 11, 4:05 pm, Chris <[EMAIL PROTECTED]> wrote: > I'm trying to find the contents of an XML tag. Nothing fancy. I don't > care about parsing child tags or anything. I just want to get the raw > text. Here's my script: > > import re > > data = """ > <?xml version='1.0'?> > <body> > <div class='default'> > here's some text! > </div> > <div class='default'> > here's some text! > </div> > <div class='default'> > here's some text! > </div> > </body> > """ > > tagName = 'div' > pattern = re.compile('<%(tagName)s\s[^>]*>[.\n\r\w\s\d\D\S\W]*[^(% > (tagName)s)]*' % dict(tagName=tagName)) > > matches = pattern.finditer(data) > for m in matches: > contents = data[m.start():m.end()] > print repr(contents) > assert tagName not in contents > > The problem I'm running into is that the [^%(tagName)s]* portion of my > regex is being ignored, so only one match is being returned, starting > at the first <div> and ending at the end of the text, when it should > end at the first </div>. For this example, it should return three > matches, one for each div. > > Is what I'm trying to do possible with Python's Regex library? Is > there an error in my Regex? > > Thanks, > Chris
print re.findall(r'<%s(?=[\s/>])[^>]*>' % 'div', r) ["<div class='default'>", "<div class='default'>", "<div class='default'>"] HTH Harvey -- http://mail.python.org/mailman/listinfo/python-list