New submission from Oudin <ou...@crans.org>:

When processing an ill-formed robots.txt file (like 
https://tiny.tobast.fr/robots-file ), the RobotFileParser.parse method does not 
instantiate the entries or the default_entry attributes.

In my opinion, the method should raise an exception when no valid User-agent 
entry (or if there exists an invalid User-agent entry) is found in the 
robots.txt file.

Otherwise, the only method available is to check the None-liness of 
default_entry, which is not documented in the documentation 
(https://docs.python.org/dev/library/urllib.robotparser.html).

According to your opinion on this, I can implement what is necessary and create 
a PR on Github.

----------
components: Library (Lib)
messages: 312711
nosy: Guinness
priority: normal
severity: normal
status: open
title: RobotFileParser.parse() should raise an exception when the robots.txt 
file is invalid
type: behavior
versions: Python 3.6, Python 3.7, Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32936>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to