Nikolay Bogoychev added the comment: Thank you for the review! I have addressed your comments and release a v2 of the patch: Highlights: No longer crashes when provided with malformed crawl-delay/robots.txt parameter. Returns None when parameter is missing or syntax is invalid. Simplified several functions. Extended tests.
http://bugs.python.org/review/16099/diff/6206/Doc/library/urllib.robotparser.rst File Doc/library/urllib.robotparser.rst (right): http://bugs.python.org/review/16099/diff/6206/Doc/library/urllib.robotparser.... Doc/library/urllib.robotparser.rst:56: .. method:: crawl_delay(useragent) On 2013/12/09 03:30:54, berkerpeksag wrote: > Is crawl_delay used for search engines? Google recommends you to set crawl > speed > via Google Webmaster Tools instead. > > See https://support.google.com/webmasters/answer/48620?hl=en. Crawl delay and request rate parameters are targeted to custom crawlers that many people/companies write for specific tasks. The Google webmaster tools is targeted only to google's crawler and typically web admins have different rates for google/yahoo/bing and all other user agents. http://bugs.python.org/review/16099/diff/6206/Lib/urllib/robotparser.py File Lib/urllib/robotparser.py (right): http://bugs.python.org/review/16099/diff/6206/Lib/urllib/robotparser.py#newco... Lib/urllib/robotparser.py:168: for entry in self.entries: On 2013/12/09 03:30:54, berkerpeksag wrote: > Is there a better way to calculate this? (perhaps O(1)?) I have followed the model of what was written beforehand. A 0(1) implementation (probably based on dictionaries) would require a complete rewrite of this library, as all previously implemented functions employ the: for entry in self.entries: if entry.applies_to(useragent): logic. I don't think this matters particularly here, as those two functions in particular need only be called once per domain and robots.txt seldom contains more than 3 entries. This is why I have just followed the design laid out by the original developer. Thanks Nick ---------- Added file: http://bugs.python.org/file33071/robotparser_v2.patch _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue16099> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com