bug#52338: Crawler bots are downloading substitutes

2021-12-19 Thread Mathieu Othacehe
> Thanks to both of you, And closing! Mathieu

bug#52338: Crawler bots are downloading substitutes

2021-12-11 Thread Mathieu Othacehe
Hey, The Cuirass web interface logs were quite silent this morning and I suspected an issue somewhere. I then realized that you did update the Nginx conf and the bots were no longer knocking at our door, which is great! Thanks to both of you, Mathieu

bug#52338: Crawler bots are downloading substitutes

2021-12-10 Thread Tobias Geerinckx-Rice via Bug reports for GNU Guix
All, Mark H Weaver 写道: For what it's worth: during the years that I administered Hydra, I found that many bots disregarded the robots.txt file that was in place there. In practice, I found that I needed to periodically scan the access logs for bots and forcefully block their requests in order

bug#52338: Crawler bots are downloading substitutes

2021-12-10 Thread Mark H Weaver
Hi Leo, Leo Famulari writes: > I noticed that some bots are downloading substitutes from > ci.guix.gnu.org. > > We should add a robots.txt file to reduce this waste. > > Specifically, I see bots from Bing and Semrush: > > https://www.bing.com/bingbot.htm > https://www.semrush.com/bot.html For w

bug#52338: Crawler bots are downloading substitutes

2021-12-10 Thread Tobias Geerinckx-Rice via Bug reports for GNU Guix
Leo Famulari 写道: Alright, I leave it up to you. Dammit. Kind regards, T G-R signature.asc Description: PGP signature

bug#52338: Crawler bots are downloading substitutes

2021-12-10 Thread Leo Famulari
On Thu, Dec 09, 2021 at 04:42:24PM +0100, Tobias Geerinckx-Rice wrote: [...] > An alternative to that is to serve a real on-disc robots.txt. Alright, I leave it up to you. I just want to prevent bots from downloading substitutes. I don't really have opinions about any of the details. signature.a

bug#52338: Crawler bots are downloading substitutes

2021-12-09 Thread Tobias Geerinckx-Rice via Bug reports for GNU Guix
Mathieu Othacehe 写道: Hello Leo, + (nginx-location-configuration + (uri "/robots.txt") It's a micro-optimisation, but it can't hurt to generate ‘location = /robots.txt’ instead of ‘location /robots.txt’ here. + (body + (list +

bug#52338: Crawler bots are downloading substitutes

2021-12-09 Thread Mathieu Othacehe
Hello Leo, > + (nginx-location-configuration > + (uri "/robots.txt") > + (body > + (list > + "add_header Content-Type text/plain;" > + "return 200 \"User-agent: *\nDisallow: /nar/\n\";")) Nice, the bots are al

bug#52338: Crawler bots are downloading substitutes

2021-12-06 Thread Leo Famulari
I noticed that some bots are downloading substitutes from ci.guix.gnu.org. We should add a robots.txt file to reduce this waste. Specifically, I see bots from Bing and Semrush: https://www.bing.com/bingbot.htm https://www.semrush.com/bot.html