On Sun, Nov 03, 2019 at 10:04:46AM -0500, Gene Heskett wrote: > Greetings all > > I am developing a list of broken webcrawlers who are repeatedly > downloading my entire web site including the hidden stuff. > > These crawlers/bots are ignoring my robots.txt
$ wget -O - https://www.shentel.com/robots.txt --2019-11-03 15:22:35-- https://www.shentel.com/robots.txt Resolving www.shentel.com (www.shentel.com)... 45.60.160.21 Connecting to www.shentel.com (www.shentel.com)|45.60.160.21|:443... connected. HTTP request sent, awaiting response... 403 Forbidden 2019-11-03 15:22:36 ERROR 403: Forbidden. Allowing said bots to *see* your robots.txt would be a step into the right direction. Reco