"Frank Potter" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> pyparsing is cool.
> but use only re is also OK
> # -*- coding: UTF-8 -*-
> import urllib2
> html=urllib2.urlopen(ur"http://www.yahoo.com/";).read()
>
> import re
> r=re.compile('<img\s+src="(?P<image>[^"]+)"[^>]*>',re.IGNORECASE)
> for m in r.finditer(html):
>     print m.group('image')
>

Ouch - this fails to match any <img> tag that has some other attribute, such
as "height" or "width", before the "src" attribute.  www.yahoo.com has
several such tags.

On the other hand, pyparsing's makeHTMLTags defines a starting tag
expression that looks for (conceptually):

    < tagname ZeroOrMore(attrname '=' value) Optional('/') >

and does not assume that the first tag is "src", or anything else for that
matter.

The returned results make the tag attributes accessible as object attributes
or dictionary keys.

-- Paul


-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to