> Pretty straight forward...but what I'm finding is if the > url is pointing to a file that is not there, the server > returns a file that's a web page displaying a 404 error. > > Anyone have any recommendations for handling this?
You're right, that is NOT documented in a way that's easy to find! What I was able to find is how to what you want using urllib2 instead of urllib. I found an old message thread that touches on the topic: http://groups.google.com/group/comp.lang.python/browse_thread/thread/3f02bee97a689927/88c7bfec87e18ba9?q=%22http+status%22+%2Burllib&rnum=3#88c7bfec87e18ba9 (also accessable as http://tinyurl.com/952dw). Here's a quick summary: ----------------------------------------------------------------------- Ivan Karajas Apr 28 2004, 11:03 pm show options Newsgroups: comp.lang.python From: Ivan Karajas <[EMAIL PROTECTED]> - Find messages by this author Date: Wed, 28 Apr 2004 23:03:54 -0800 Local: Wed, Apr 28 2004 11:03 pm Subject: Re: 404 errors Reply to Author | Forward | Print | Individual Message | Show original | Report Abuse On Tue, 27 Apr 2004 10:46:47 +0200, Tut wrote: > Tue, 27 Apr 2004 11:00:57 +0800, Derek Fountain wrote: >> Some servers respond with a nicely formatted bit of HTML explaining the >> problem, which is fine for a human, but not for a script. Is there some >> flag or something definitive on the response which says "this is a 404 >> error"? > Maybe catch the urllib2.HTTPError? This kind of answers the question. urllib will let you read whatever it receives, regardless of the HTTP status; you need to use urllib2 if you want to find out the status code when a request results in an error (any HTTP status beginning with a 4 or 5). This can be done like so: import urllib2 try: asock = urllib2.urlopen("http://www.foo.com/qwerty.html") except urllib2.HTTPError, e: print e.code The value in urllib2.HTTPError.code comes from the first line of the web server's HTTP response, just before the headers begin, e.g. "HTTP/1.1 200 OK", or "HTTP/1.1 404 Not Found". One thing you need to be aware of is that some web sites don't behave as you would expect them to; e.g. responding with a redirection rather than a 404 error when you when you request a page that doesn't exist. In these cases you might still have to rely on some clever scripting. ---------------------------------------------------------------------- I hope that helps. Dan -- http://mail.python.org/mailman/listinfo/python-list