Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-03 Thread matthew sporleder
On Thu, Jul 3, 2025 at 3:38 PM Constantine A. Murenin wrote: > > On Thu, 3 Jul 2025 at 04:30, Jörg Sonnenberger wrote: > > > > On 7/3/25 6:23 AM, Constantine A. Murenin wrote: > > > These AIs literally behave the exact same way as humans; they're > > > simply dumber and more persistent. The way

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-03 Thread Constantine A. Murenin
On Thu, 3 Jul 2025 at 04:30, Jörg Sonnenberger wrote: > > On 7/3/25 6:23 AM, Constantine A. Murenin wrote: > > These AIs literally behave the exact same way as humans; they're > > simply dumber and more persistent. The way CVSweb is designed, it's > > easily DoS'able with the default `wget -r` an

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-03 Thread Mouse
>> Another possible reason is that I don't speak HTTPS; I consider it >> plausble the LLM scrapers have drunk the "HTTPS is the One True Way" >> koolaid and aren't even trying HTTP. > I am seeing the exact opposite from GPTBot: [...] Fascinating! So - at least for that one - there must be some ot

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-03 Thread Constantine A. Murenin
On Thu, 3 Jul 2025 at 10:18, Mouse wrote: > Actually, most offenders of type (1) usually just go into the automated > list, because I don't use the top and bottom addresses of my netblock > for anything but scanner sentinels; anyone trying to access them goes > into the automated list. Most addre

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-03 Thread Christof Meerwald
On Thu, Jul 03, 2025 at 11:17:51AM -0400, Mouse wrote: > Another possible reason is that I don't speak HTTPS; I consider it > plausble the LLM scrapers have drunk the "HTTPS is the One True Way" > koolaid and aren't even trying HTTP. I am seeing the exact opposite from GPTBot: it tries http on eve

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-03 Thread Mouse
> [...] fairly typical load but I've documented surges upwards of 2500+ > new TCP connections per second; I typically end up banning an entire > /16 or two to recover my VM when it happens. One of the front-runners in my mind for why I'm not being DDoSed similarly is that my main house router has

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-03 Thread Aaron B.
On Thu, 3 Jul 2025 11:59:23 +0200 Christof Meerwald wrote: > > (1) They have no effective rate limiting mechanism on the origin side. > > (2) They are intentionally distributing requests to avoid server side rate > > limits. > > (3) The combination of the two makes most caching useless. > > (3) T

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-03 Thread Mouse
[Christof] > Personally, I am seeing gptbot crawling at a rate of up to about 1 > request per second. [Joerg] > My own web sites see moderate scraper traffic, but they don't have a > large site graph either. On various other sites like anonhg.n.o, the > main Mercurial bug tracker etc we have been

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-03 Thread Christof Meerwald
On Thu, Jul 03, 2025 at 11:30:48AM +0200, Jörg Sonnenberger wrote: > On 7/3/25 6:23 AM, Constantine A. Murenin wrote: > > Can you really blame kids for looking at all 5000 links from a single > > file, when you give them 5000 links to start with? Maybe start by not > > giving the 5000 unique links

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-03 Thread Jörg Sonnenberger
On 7/3/25 6:23 AM, Constantine A. Murenin wrote: Can you really blame kids for looking at all 5000 links from a single file, when you give them 5000 links to start with? Maybe start by not giving the 5000 unique links from a single file, and implement caching / throttling? How could you know th

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-02 Thread Thor Lancelot Simon
On Wed, Jul 02, 2025 at 11:23:21PM -0500, Constantine A. Murenin wrote: > > BSD licence is also a very permissive licence; when people compile > this code and distribute the binaries, they aren't required to "keep > the licence", either, so, how is this different? Besides, unlike GPL The above s

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-02 Thread Constantine A. Murenin
On Wed, 2 Jul 2025 at 22:20, matthew green wrote: > > > Why would we NOT want to have AI train on our source code? > > they abuse services - DDoS sites constantly. many projects have been > restricting access because otherwise they're not available to humans. > > they don't keep licenses on code.

re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-02 Thread matthew green
> Why would we NOT want to have AI train on our source code? they abuse services - DDoS sites constantly. many projects have been restricting access because otherwise they're not available to humans. they don't keep licenses on code. .mrg.

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-07-02 Thread Constantine A. Murenin
On Sun, 8 Jun 2025 at 11:42, Mouse wrote: > > >> I can't easily check -current, because HTTP access to cvsweb has > >> been broken; it now insists on trying to ram HTTPS down my throat. > > Side note: it is far worse than http vs. https, it uses www/anubis > > [...JavaScript worker threads...sha25

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-06-08 Thread Mouse
>> I can't easily check -current, because HTTP access to cvsweb has >> been broken; it now insists on trying to ram HTTPS down my throat. > Side note: it is far worse than http vs. https, it uses www/anubis > [...JavaScript worker threads...sha256...] > Unfortunately this kind of drastic measures

Re: cvsweb anti-bot protection (was: Retrieving MAC address from struct ifnet)

2025-06-08 Thread Martin Husemann
On Sun, Jun 08, 2025 at 11:07:26AM -0400, Mouse wrote: > [..] I can't easily > check -current, because HTTP access to cvsweb has been broken; it now > insists on trying to ram HTTPS down my throat. Side note: it is far worse than http vs. https, it uses www/anubis from pkgsrc to verify you are ac