On Jan 11, 2009, at 10:05 PM, James Mills wrote:

On Mon, Jan 12, 2009 at 12:58 PM, Philip Semanchuk <phi...@semanchuk.com > wrote:

On Jan 11, 2009, at 8:59 PM, James Mills wrote:

Hey all,

The following fails for me:

from urllib2 import urlopen
f =
urlopen("http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml ")

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/urllib2.py", line 124, in urlopen
 return _opener.open(url, data, timeout)
File "/usr/lib/python2.6/urllib2.py", line 389, in open
 response = meth(req, response)
File "/usr/lib/python2.6/urllib2.py", line 502, in http_response
 'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.6/urllib2.py", line 427, in error
 return self._call_chain(*args)
File "/usr/lib/python2.6/urllib2.py", line 361, in _call_chain
 result = func(*args)
File "/usr/lib/python2.6/urllib2.py", line 510, in http_error_default
 raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden


However, that _same_ url works perfectly fine on the
same machine (and same network) using any of:
* curl
* wget
* elinks
* firefox

Any helpful ideas ?

The remote server doesn't like your user agent?

It'd be easier to help if you post a working sample.

That was a working sample!

Oooops, I guess it is my brain that's not working, then! Sorry about that.

I tried your sample and got the 403. This works for me:

>>> import urllib2
>>> user_agent="Mozilla/5.001 (windows; U; NT4.0; en-US; rv:1.0) Gecko/25250101" >>> url="http://groups.google.com/group/chromium-announce/feed/rss_v2_0_msgs.xml "
>>> req = urllib2.Request(url, None, { 'User-Agent' : user_agent})
>>> f = urllib2.urlopen(req)
>>> s=f.read()
>>> f.close()
>>> print s
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<rss version="2.0">
  <channel>
  <title>Chromium-Announce Google Group</title>
  <link>http://groups.google.com/group/chromium-announce</link>
<description>This list is intended for important product announcements that affect the majority of
etc.


Why Google would deny access to services by
unknown User Agents is beyond me - especially
since in most cases User Agents strings are not
strict.

Some sites ban UAs that look like bots. I know there's a Java-based bot with a distinct UA that was really badly-behaved when visiting my server. Ignored robots.txt, fetched pages as quickly as it could etc. That was worthy of banning. FWIW, when I try the code above with a UA of "funny fish" it still works OK, so it looks like the groups.google.com server has it out for UAs with Python in them, not just unknown ones.

I'm sure that if you changed wget's UA string to something Pythonic it would start to fail too.


Cheers
Philip
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to