Chris Rebert wrote: > page_url = "http://the.url.here" > > with urllib.urlopen(page_url) as f: > soup = BeautifulSoup(f.read()) > for img_tag in soup.findAll("img"): > relative_url = img_tag.src > img_url = make_absolute(relative_url, page_url) > save_image_from_url(img_url) > > 2. Write make_absolute() and save_image_from_url()
Note that lxml.html provides a make_links_absolute() function. Also untested: from lxml import html doc = html.parse(page_url) doc.make_links_absolute(page_url) urls = [ img.src for img in doc.xpath('//img') ] Then use e.g. urllib2 to save the images. Stefan -- http://mail.python.org/mailman/listinfo/python-list