Bugs item #947571, was opened at 2004-05-04 09:57 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=947571&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: None >Status: Closed >Resolution: Fixed Priority: 5 Submitted By: M.-A. Lemburg (lemburg) Assigned to: Nobody/Anonymous (nobody) Summary: urllib.urlopen() fails to raise exception Initial Comment: I've come across a strange problem: even though the docs say that urllib.urlopen() should raise an IOError for server errors (e.g. 404s), all versions of Python that I've tested (1.5.2 - 2.3) fail to do so. Example: >>> import urllib >>> f = urllib.urlopen('http://www.example.net/this-url-does-not-exist/') >>> print f.read() <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <HTML><HEAD> <TITLE>404 Not Found</TITLE> </HEAD><BODY> <H1>Not Found</H1> The requested URL /this-url-does-not-exist/ was not found on this server.<P> <HR> <ADDRESS>Apache/1.3.27 Server at www.example.com Port 80</ADDRESS> </BODY></HTML> Either the docs are wrong or the implementation has a really long standing bug or I am missing something. ---------------------------------------------------------------------- >Comment By: Georg Brandl (gbrandl) Date: 2006-02-20 21:26 Message: Logged In: YES user_id=849994 Committed an addition to the docs in rev. 42527, 42528. ---------------------------------------------------------------------- Comment By: Mike Brown (mike_j_brown) Date: 2004-07-10 19:20 Message: Logged In: YES user_id=371366 I suggest closing as Won't Fix or Not A Bug, but change the documentation for urllib.urlopen() to read: """urlopen(url [, data]) -> open file-like object using urllib._urlopener, which will be an instance of FancyURLopener if not already set.""" The onus is still on the user to notice in the docs that FancyURLopener will ignore HTTP error responses for which it does not have an explicit handler, but at least this way they'll at least be pointed in the right direction. ---------------------------------------------------------------------- Comment By: John J Lee (jjlee) Date: 2004-07-10 19:08 Message: Logged In: YES user_id=261020 Seems a mistake to change this now. The current behaviour should be documented, though, of course. ---------------------------------------------------------------------- Comment By: Mike Brown (mike_j_brown) Date: 2004-07-10 18:39 Message: Logged In: YES user_id=371366 I suppose I could've made that example a little simpler: class ErrorRecognizingURLopener(urllib.FancyURLopener): http_error_default = urllib.URLopener.http_error_default urllib._urlopener = ErrorRecognizingURLopener() ---------------------------------------------------------------------- Comment By: Mike Brown (mike_j_brown) Date: 2004-07-10 18:25 Message: Logged In: YES user_id=371366 In urllib.FancyURLopener, which is the class used by urllib.urlopen(), there is this override of URLopener's http_error_default: def http_error_default(self, url, fp, errcode, errmsg, headers): """Default error handling -- don't raise an exception.""" return addinfourl(fp, headers, "http:" + url) I don't see how this is really all that desirable, but nevertheless it appears to be quite deliberate. It looks like the intent in urlopen is that if you want to use some other opener besides an instance of FancyURLopener, you can set urllib._urlopener. This seems to work: >>> import urllib >>> class MyUrlOpener(urllib.FancyURLopener): ... def http_error_default(*args, **kwargs): ... return urllib.URLopener.http_error_default(*args, **kwargs) ... >>> urllib._urlopener = MyUrlOpener() >>> urllib.urlopen('http://www.example.com/this-url-does- not-exist/') Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/local/lib/python2.3/urllib.py", line 76, in urlopen return opener.open(url) File "/usr/local/lib/python2.3/urllib.py", line 181, in open return getattr(self, name)(url) File "/usr/local/lib/python2.3/urllib.py", line 306, in open_http return self.http_error(url, fp, errcode, errmsg, headers) File "/usr/local/lib/python2.3/urllib.py", line 323, in http_error return self.http_error_default(url, fp, errcode, errmsg, headers) File "<stdin>", line 3, in http_error_default File "/usr/local/lib/python2.3/urllib.py", line 329, in http_error_default raise IOError, ('http error', errcode, errmsg, headers) IOError: ('http error', 404, 'Not Found', <httplib.HTTPMessage instance at 0x836298c>) ---------------------------------------------------------------------- Comment By: Walter Dörwald (doerwalter) Date: 2004-06-02 18:29 Message: Logged In: YES user_id=89016 This seems to work with urllib2: >>> import urllib2 >>> f = urllib2.urlopen('http://www.example.net/this-url-does- not-exist/') Traceback (most recent call last): File "<stdin>", line 1, in ? File "/usr/local/lib/python2.3/urllib2.py", line 129, in urlopen return _opener.open(url, data) File "/usr/local/lib/python2.3/urllib2.py", line 326, in open '_open', req) File "/usr/local/lib/python2.3/urllib2.py", line 306, in _call_chain result = func(*args) File "/usr/local/lib/python2.3/urllib2.py", line 901, in http_open return self.do_open(httplib.HTTP, req) File "/usr/local/lib/python2.3/urllib2.py", line 895, in do_open return self.parent.error('http', req, fp, code, msg, hdrs) File "/usr/local/lib/python2.3/urllib2.py", line 352, in error return self._call_chain(*args) File "/usr/local/lib/python2.3/urllib2.py", line 306, in _call_chain result = func(*args) File "/usr/local/lib/python2.3/urllib2.py", line 412, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 404: Not Found ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=947571&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com