[issue21469] Hazards in robots.txt parser

2014-05-11 Thread Raymond Hettinger
Changes by Raymond Hettinger : Added file: http://bugs.python.org/file35216/fix_false_pos2.diff ___ Python tracker ___ ___ Python-bugs-list ma

[issue21469] Hazards in robots.txt parser

2014-05-11 Thread Raymond Hettinger
Raymond Hettinger added the comment: Update patch to move the modified() call to parse(). That lets the mtime update whenever rules (either by a read() or by directly parsing text). -- ___ Python tracker

[issue21469] Hazards in robots.txt parser

2014-05-11 Thread Raymond Hettinger
Raymond Hettinger added the comment: Attaching a draft patch: * Repair the broken link to norobots-rfc.txt. * HTTP response codes >= 500 treated as a failed read rather than as a not found. Not found means that we can assume the entire site is allowed. A 5xx server error tells us nothing.

[issue21469] Hazards in robots.txt parser

2014-05-10 Thread Raymond Hettinger
New submission from Raymond Hettinger: * The can_fetch() method is not checking to see if read() has been called, so it returns false positives if read() has not been called. * When read() is called, it fails to call modified() so that mtime() returns an incorrect result. The user has to manu