Re: Retrieve url's of all jpegs at a web page URL

Paul McGuire Tue, 15 Sep 2009 22:16:29 -0700

On Sep 15, 11:32 pm, Stefan Behnel <[email protected]> wrote:
> Also untested:
>
>         from lxml import html
>
>         doc = html.parse(page_url)
>         doc.make_links_absolute(page_url)
>
>         urls = [ img.src for img in doc.xpath('//img') ]
>
> Then use e.g. urllib2 to save the images.


Looks similar to what a pyparsing approach would look like:

    from pyparsing import makeHTMLTags, htmlComment

    import urllib
    html = urllib.urlopen(url).read()

    imgTag = makeHTMLTags("img")[0]
    imgTag.ignore(htmlComment)

    urls = [ img.src for img in imgTag.searchString(html) ]

-- Paul
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Retrieve url's of all jpegs at a web page URL

Reply via email to