Re: Scraping a web page

2009-04-08 Thread Iain King
On Apr 7, 1:44 pm, Tim Chase wrote: > > f = urllib.urlopen("http://www.google.com";) > > s = f.read() > > > It is working, but it's returning the source of the page. Is there anyway I > > can get almost a screen capture of the page? > > This is the job of a browser -- to render the source HTML.  A

RE: Scraping a web page

2009-04-08 Thread Lawrence D'Oliveiro
In message , Support Desk wrote: > You could do something like below to get the rendered page. > > Import os > site = 'website.com' > X = os.popen('lynx --dump %s' % site).readlines() I wonder how easy it would be to get the page image in SVG format? I believe the Gecko HTML engine in Firefox

Re: Scraping a web page

2009-04-07 Thread cgoldberg
> Is there anyway I > can get almost a screen capture of the page? I'm not sure exactly what you mean by "screen capture". But the webbrowser module in the standard lib might be of some help. You can use it to drive a web browser from Python. to load a page in your browser, you can do something

RE: Scraping a web page

2009-04-07 Thread Support Desk
From: Ronn Ross [mailto:ronn.r...@gmail.com] Sent: Tuesday, April 07, 2009 9:37 AM To: Support Desk Subject: Re: Scraping a web page This works great, but is there a way to do this with firefox or something similar so I can also print the images from the site? On Tue, Apr 7, 2009 at 9:58

RE: Scraping a web page

2009-04-07 Thread Support Desk
Ronn Ross Cc: python-list@python.org Subject: Re: Scraping a web page > f = urllib.urlopen("http://www.google.com";) > s = f.read() > > It is working, but it's returning the source of the page. Is there anyway I > can get almost a screen capture of the page? This is the

Re: Scraping a web page

2009-04-07 Thread Tim Chase
f = urllib.urlopen("http://www.google.com";) s = f.read() It is working, but it's returning the source of the page. Is there anyway I can get almost a screen capture of the page? This is the job of a browser -- to render the source HTML. As such, you'd want to look into any of the browser-aut