[issue35457] robotparser reads empty robots.txt file as "all denied"

2020-01-02 Thread Andre Burgaud
Andre Burgaud added the comment: Thanks @xtreak for providing some clarification on this behavior! I can write some tests to cover this behavior, assuming that we agree that an empty file means "unlimited access". This was worded as such in the old internet draft from 1996 (section 3.2.1 in

[issue35457] robotparser reads empty robots.txt file as "all denied"

2020-01-02 Thread Karthikeyan Singaravelan
Karthikeyan Singaravelan added the comment: There is a behavior change. parse() sets the modified time and unless the modified time is set the can_fetch method returns false. In Python 2 the parse method was called only when the file is non-empty [0] but in Python 3 it's always called though

[issue35457] robotparser reads empty robots.txt file as "all denied"

2020-01-01 Thread Andre Burgaud
Andre Burgaud added the comment: Hi, Is this ticket still relevant for Python 3.8? While running some tests with an empty robotstxt file I realized that it was returning "ALLOWED" for any path (as per the current draft of the Robots Exclusion Protocol: https://tools.ietf.org/html/draft-kos

[issue35457] robotparser reads empty robots.txt file as "all denied"

2018-12-17 Thread larsfuse
larsfuse added the comment: > (...) refers users, for file structure, to > http://www.robotstxt.org/orig.html. This says nothing about the effect of an > empty file, so I don't see this as a bug. That is incorrect. From that url you can find: > The presence of an empty "/robots.txt" file has

[issue35457] robotparser reads empty robots.txt file as "all denied"

2018-12-14 Thread Terry J. Reedy
Terry J. Reedy added the comment: https://docs.python.org/2.7/library/robotparser.html#module-robotparser and https://docs.python.org/3/library/urllib.robotparser.html#module-urllib.robotparser refers users, for file structure, to http://www.robotstxt.org/orig.html. This says nothing about the

[issue35457] robotparser reads empty robots.txt file as "all denied"

2018-12-11 Thread larsfuse
New submission from larsfuse : The standard (http://www.robotstxt.org/robotstxt.html) says: > To allow all robots complete access: > User-agent: * > Disallow: > (or just create an empty "/robots.txt" file, or don't use one at all) Here I give python an empty file: $ curl http://10.223.68.186/r