RE: Scraping a web page

Support Desk Tue, 07 Apr 2009 07:00:03 -0700

You could do something like below to get the rendered page.

Import os
site = 'website.com'
X = os.popen('lynx --dump %s' % site).readlines()








-----Original Message-----
From: Tim Chase [mailto:[email protected]] 
Sent: Tuesday, April 07, 2009 7:45 AM
To: Ronn Ross
Cc: [email protected]
Subject: Re: Scraping a web page

> f = urllib.urlopen("http://www.google.com";)
> s = f.read()
> 
> It is working, but it's returning the source of the page. Is there anyway
I
> can get almost a screen capture of the page?

This is the job of a browser -- to render the source HTML.  As 
such, you'd want to look into any of the browser-automation 
libraries to hook into IE, FireFox, Opera, or maybe using the 
WebKit/KHTML control.  You may then be able to direct it to 
render the HTML into a canvas you can then treat as an image.

Another alternative might be provided by some web-services that 
will render a page as HTML with various browsers and then send 
you the result.  However, these are usually either (1) 
asynchronous or (2) paid services (or both).

-tkc







--
http://mail.python.org/mailman/listinfo/python-list

RE: Scraping a web page

Reply via email to