Eduardo A. Bustamante López added the comment: Hi Senthil,
> I fail to see the bug in here. Robotparser module is for reading and > parsing the robot.txt file, the module responsible for fetching it > could urllib. You're right, but robotparser's read() does a call to urllib.request.urlopen to fetch the robots.txt file. If robotparser took a file object, or something like that instead of a Url, I wouldn't think of this as a bug, but it doesn't. The default behaviour is for it to fetch the file itself, using urlopen. Also, I'm aware that you shouldn't normally worry about setting a specific user-agent to fetch the file. But that's not the case of Wikipedia. In my case, Wikipedia returned 403 for the urllib user-agent. And since there's no documented way of specifying a particular user-agent in robotparser, or to feed a file object to robotparser, I decided to report this. Only after reading the source of 2.7.x and 3.x, one can find work-arounds for that problem, since it's not really clear how these make the requests for the robots.txt in the documentation. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue15851> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com