I'm trying to get the HTML data off of a webpage. Let's say for the sake of argument it's the python homepage. I've googled around and found some examples that people said worked. Here's what I've cobbled together:
#getHTML.py ############################################ import urllib import urllib2 proxy_info = {'user':'us3r', 'password':'[EMAIL PROTECTED]', 'host':'MY_PROXY', 'port':'80'} os.environ['HTTP_PROXY'] = 'http://%(user)s:%(password)[EMAIL PROTECTED](host)s:%(port)s' % proxy_info test_url = "http://www.python.org/index.html" handle = urllib2.urlopen(test_url) #handle = urllib.urlopen(test_url) txt = handle.read().lower() handle.close() print "Text: " print txt ################################# #end getHTML.py When I run this with urllib2 I get (with or without a dummy password): Traceback (most recent call last): File "P:\My Documents\Projects\Python\validate_zipcodes.py", line 103, in ? handle = urllib2.urlopen(test_url) File "C:\Python23\lib\urllib2.py", line 129, in urlopen return _opener.open(url, data) File "C:\Python23\lib\urllib2.py", line 326, in open '_open', req) File "C:\Python23\lib\urllib2.py", line 306, in _call_chain result = func(*args) File "C:\Python23\lib\urllib2.py", line 901, in http_open return self.do_open(httplib.HTTP, req) File "C:\Python23\lib\urllib2.py", line 886, in do_open raise URLError(err) urllib2.URLError: <urlopen error (7, 'getaddrinfo failed')> When I run it with urllib.urlopen, I get: Traceback (most recent call last): File "P:\My Documents\Projects\Python\validate_zipcodes.py", line 104, in ? handle = urllib.urlopen(test_url) File "C:\Python23\lib\urllib.py", line 76, in urlopen return opener.open(url) File "C:\Python23\lib\urllib.py", line 181, in open return getattr(self, name)(url) File "C:\Python23\lib\urllib.py", line 287, in open_http h = httplib.HTTP(host) File "C:\Python23\lib\httplib.py", line 1009, in __init__ self._setup(self._connection_class(host, port, strict)) File "C:\Python23\lib\httplib.py", line 507, in __init__ self._set_hostport(host, port) File "C:\Python23\lib\httplib.py", line 518, in _set_hostport raise InvalidURL("nonnumeric port: '%s'" % host[i+1:]) httplib.InvalidURL: nonnumeric port: '[EMAIL PROTECTED]@MY_PROXY:80' Obviously, going through Internet Explorer works. Has anyone else had a similar issue? I don't know the proxy situation we have here, so is it possible that the proxy is causing this? Any help is much appreciated. Thanks for at least reading this far! M@ -- http://mail.python.org/mailman/listinfo/python-list