For some reason, Python's parser for "robots.txt" files doesn't like Wikipedia's "robots.txt" file:
>>> import robotparser >>> url = 'http://wikipedia.org/robots.txt' >>> chk = robotparser.RobotFileParser() >>> chk.set_url(url) >>> chk.read() >>> testurl = 'http://wikipedia.org' >>> chk.can_fetch('Mozilla', testurl) False >>> The Wikipedia robots.txt file passes robots.txt validation, and it doesn't disallow unknown browsers. But the Python parser doesn't see it that way. No matter what user agent or URL is specified; for that robots.txt file, the only answer is "False". It's failing in Python 2.4 on Windows and 2.5 on Fedora Core. I use "robotparser" on lots of other robots.txt files, and it normally works. It even used to work on Wikipedia's older file. But there's something in there now that robotparser doesn't like. Any ideas? John Nagle -- http://mail.python.org/mailman/listinfo/python-list