[issue35457] robotparser reads empty robots.txt file as "all denied"

larsfuse Tue, 11 Dec 2018 01:31:26 -0800


New submission from larsfuse <l...@cl.no>:


The standard (http://www.robotstxt.org/robotstxt.html) says:

> To allow all robots complete access:
> User-agent: *
> Disallow:
> (or just create an empty "/robots.txt" file, or don't use one at all)

Here I give python an empty file:
$ curl http://10.223.68.186/robots.txt
$

Code:

rp = robotparser.RobotFileParser()
print (robotsurl)
rp.set_url(robotsurl)
rp.read()
print( "fetch /", rp.can_fetch(useragent = "*", url = "/"))
print( "fetch /admin", rp.can_fetch(useragent = "*", url = "/admin"))

Result:

$ ./test.py
http://10.223.68.186/robots.txt
('fetch /', False)
('fetch /admin', False)

And the result is, robotparser thinks the site is blocked.

----------
components: Library (Lib)
messages: 331595
nosy: larsfuse
priority: normal
severity: normal
status: open
title: robotparser reads empty robots.txt file as "all denied"
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35457>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35457] robotparser reads empty robots.txt file as "all denied"

Reply via email to