On Sunday 03 November 2019 10:23:50 Reco wrote: > On Sun, Nov 03, 2019 at 10:04:46AM -0500, Gene Heskett wrote: > > Greetings all > > > > I am developing a list of broken webcrawlers who are repeatedly > > downloading my entire web site including the hidden stuff. > > > > These crawlers/bots are ignoring my robots.txt > > $ wget -O - https://www.shentel.com/robots.txt > --2019-11-03 15:22:35-- https://www.shentel.com/robots.txt > Resolving www.shentel.com (www.shentel.com)... 45.60.160.21 > Connecting to www.shentel.com (www.shentel.com)|45.60.160.21|:443... > connected. HTTP request sent, awaiting response... 403 Forbidden > 2019-11-03 15:22:36 ERROR 403: Forbidden. > > Allowing said bots to *see* your robots.txt would be a step into the > right direction. > > Reco But you are asking for shentel.com/robots.txt which is my isp. You should be asking for
http://geneslinuxbox.net:6309/gene/robots.txt Cheers, Gene Heskett -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis Genes Web page <http://geneslinuxbox.net:6309/gene>