On Sun, Nov 03, 2019 at 10:48:58AM -0500, Gene Heskett wrote: > On Sunday 03 November 2019 10:23:50 Reco wrote: > > > On Sun, Nov 03, 2019 at 10:04:46AM -0500, Gene Heskett wrote: > > > Greetings all > > > > > > I am developing a list of broken webcrawlers who are repeatedly > > > downloading my entire web site including the hidden stuff. > > > > > > These crawlers/bots are ignoring my robots.txt > > > > $ wget -O - https://www.shentel.com/robots.txt > > --2019-11-03 15:22:35-- https://www.shentel.com/robots.txt > > Resolving www.shentel.com (www.shentel.com)... 45.60.160.21 > > Connecting to www.shentel.com (www.shentel.com)|45.60.160.21|:443... > > connected. HTTP request sent, awaiting response... 403 Forbidden > > 2019-11-03 15:22:36 ERROR 403: Forbidden. > > > > Allowing said bots to *see* your robots.txt would be a step into the > > right direction. > > > But you are asking for shentel.com/robots.txt which is my isp. > You should be asking for > > http://geneslinuxbox.net:6309/gene/robots.txt
Wow. You sir owe me a new set of eyes. I advise you to compare your monstrosity to this (a hint - it does work) - [1]. Reco [1] https://enotuniq.net/robots.txt