"Frank Potter" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > pyparsing is cool. > but use only re is also OK > # -*- coding: UTF-8 -*- > import urllib2 > html=urllib2.urlopen(ur"http://www.yahoo.com/").read() > > import re > r=re.compile('<img\s+src="(?P<image>[^"]+)"[^>]*>',re.IGNORECASE) > for m in r.finditer(html): > print m.group('image') >
Ouch - this fails to match any <img> tag that has some other attribute, such as "height" or "width", before the "src" attribute. www.yahoo.com has several such tags. On the other hand, pyparsing's makeHTMLTags defines a starting tag expression that looks for (conceptually): < tagname ZeroOrMore(attrname '=' value) Optional('/') > and does not assume that the first tag is "src", or anything else for that matter. The returned results make the tag attributes accessible as object attributes or dictionary keys. -- Paul -- http://mail.python.org/mailman/listinfo/python-list