Why doesn't Python's "robotparser" like Wikipedia's "robots.txt" file?

John Nagle Mon, 01 Oct 2007 20:38:21 -0700

    For some reason, Python's parser for "robots.txt" files
doesn't like Wikipedia's "robots.txt" file:


 >>> import robotparser
 >>> url = 'http://wikipedia.org/robots.txt'
 >>> chk = robotparser.RobotFileParser()
 >>> chk.set_url(url)
 >>> chk.read()
 >>> testurl = 'http://wikipedia.org'
 >>> chk.can_fetch('Mozilla', testurl)
False
 >>>

    The Wikipedia robots.txt file passes robots.txt validation,
and it doesn't disallow unknown browsers.  But the Python
parser doesn't see it that way.  No matter what user agent or URL is
specified; for that robots.txt file, the only answer is "False".
It's failing in Python 2.4 on Windows and 2.5 on Fedora Core.

    I use "robotparser" on lots of other robots.txt files, and it
normally works.  It even used to work on Wikipedia's older file.
But there's something in there now that robotparser doesn't like.
Any ideas?

                                John Nagle
-- 
http://mail.python.org/mailman/listinfo/python-list

Why doesn't Python's "robotparser" like Wikipedia's "robots.txt" file?

Reply via email to