Andre Burgaud added the comment:
Thanks @xtreak for providing some clarification on this behavior! I can write
some tests to cover this behavior, assuming that we agree that an empty file
means "unlimited access". This was worded as such in the old internet draft
from 1996 (section 3.2.1 in
Karthikeyan Singaravelan added the comment:
There is a behavior change. parse() sets the modified time and unless the
modified time is set the can_fetch method returns false. In Python 2 the parse
method was called only when the file is non-empty [0] but in Python 3 it's
always called though
Andre Burgaud added the comment:
Hi,
Is this ticket still relevant for Python 3.8?
While running some tests with an empty robotstxt file I realized that it was
returning "ALLOWED" for any path (as per the current draft of the Robots
Exclusion Protocol:
https://tools.ietf.org/html/draft-kos
larsfuse added the comment:
> (...) refers users, for file structure, to
> http://www.robotstxt.org/orig.html. This says nothing about the effect of an
> empty file, so I don't see this as a bug.
That is incorrect. From that url you can find:
> The presence of an empty "/robots.txt" file has
Terry J. Reedy added the comment:
https://docs.python.org/2.7/library/robotparser.html#module-robotparser
and
https://docs.python.org/3/library/urllib.robotparser.html#module-urllib.robotparser
refers users, for file structure, to http://www.robotstxt.org/orig.html.
This says nothing about the
New submission from larsfuse :
The standard (http://www.robotstxt.org/robotstxt.html) says:
> To allow all robots complete access:
> User-agent: *
> Disallow:
> (or just create an empty "/robots.txt" file, or don't use one at all)
Here I give python an empty file:
$ curl http://10.223.68.186/r