Tempo wrote: > In my last post I received some advice to use urllib.read() to get a > whole html page as a string, which will then allow me to use > BeautifulSoup to do what I want with the string. But when I was > researching the 'urllib' module I couldn't find anything about its > sub-section '.read()' ? Is that the right module to get a html page > into a string? Or am I completely missing something here? I'll take > this as the more likely of the two cases. Thanks for any and all help. > I think you've misunderstood. You call urllib.urlopen() with a URL as an argument. The object that this call returns is file-like (in so far as you can read it to get the content of the web page):
>>> import urllib >>> page = urllib.urlopen("http://www.holdenweb.com/") >>> data = page.read() >>> print data <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <meta http-equiv="content-type" content="text/html;charset=ISO-8859-1"> <meta name="generator" content="Adobe GoLive 6"> <meta http-equiv="DESCRIPTION" content="Holden Web provides architectural design of databases and information systems, with full-service implementation and support"> ... </tr> </tbody> </table> </div> </body> </html> >>> You will find there are lots of other things you can do with that file-like object too, but reading it is the important one as far as using BeautifulSoup goes. regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC www.holdenweb.com PyCon TX 2006 www.python.org/pycon/ -- http://mail.python.org/mailman/listinfo/python-list