I just discovered that the "robotparser" module interprets a 403 ("Forbidden") status on a "robots.txt" file as meaning "all access disallowed". That's unexpected behavior.
A major site ("http://www.aplus.net/robot.txt") has their "robots.txt" file set up that way. There's no real "robots.txt" standard, unfortunately. So it's not definitively a bug. John Nagle SiteTruth -- http://mail.python.org/mailman/listinfo/python-list