On Thu, Jul 3, 2025 at 3:38 PM Constantine A. Murenin
wrote:
>
> On Thu, 3 Jul 2025 at 04:30, Jörg Sonnenberger wrote:
> >
> > On 7/3/25 6:23 AM, Constantine A. Murenin wrote:
> > > These AIs literally behave the exact same way as humans; they're
> > > simply dumber and more persistent. The way
On Thu, 3 Jul 2025 at 04:30, Jörg Sonnenberger wrote:
>
> On 7/3/25 6:23 AM, Constantine A. Murenin wrote:
> > These AIs literally behave the exact same way as humans; they're
> > simply dumber and more persistent. The way CVSweb is designed, it's
> > easily DoS'able with the default `wget -r` an
>> Another possible reason is that I don't speak HTTPS; I consider it
>> plausble the LLM scrapers have drunk the "HTTPS is the One True Way"
>> koolaid and aren't even trying HTTP.
> I am seeing the exact opposite from GPTBot: [...]
Fascinating! So - at least for that one - there must be some ot
On Thu, 3 Jul 2025 at 10:18, Mouse wrote:
> Actually, most offenders of type (1) usually just go into the automated
> list, because I don't use the top and bottom addresses of my netblock
> for anything but scanner sentinels; anyone trying to access them goes
> into the automated list. Most addre
On Thu, Jul 03, 2025 at 11:17:51AM -0400, Mouse wrote:
> Another possible reason is that I don't speak HTTPS; I consider it
> plausble the LLM scrapers have drunk the "HTTPS is the One True Way"
> koolaid and aren't even trying HTTP.
I am seeing the exact opposite from GPTBot: it tries http on eve
> [...] fairly typical load but I've documented surges upwards of 2500+
> new TCP connections per second; I typically end up banning an entire
> /16 or two to recover my VM when it happens.
One of the front-runners in my mind for why I'm not being DDoSed
similarly is that my main house router has
On Thu, 3 Jul 2025 11:59:23 +0200
Christof Meerwald wrote:
> > (1) They have no effective rate limiting mechanism on the origin side.
> > (2) They are intentionally distributing requests to avoid server side rate
> > limits.
> > (3) The combination of the two makes most caching useless.
> > (3) T
[Christof]
> Personally, I am seeing gptbot crawling at a rate of up to about 1
> request per second.
[Joerg]
> My own web sites see moderate scraper traffic, but they don't have a
> large site graph either. On various other sites like anonhg.n.o, the
> main Mercurial bug tracker etc we have been
On Thu, Jul 03, 2025 at 11:30:48AM +0200, Jörg Sonnenberger wrote:
> On 7/3/25 6:23 AM, Constantine A. Murenin wrote:
> > Can you really blame kids for looking at all 5000 links from a single
> > file, when you give them 5000 links to start with? Maybe start by not
> > giving the 5000 unique links
On 7/3/25 6:23 AM, Constantine A. Murenin wrote:
Can you really blame kids for looking at all 5000 links from a single
file, when you give them 5000 links to start with? Maybe start by not
giving the 5000 unique links from a single file, and implement caching
/ throttling? How could you know th
On Wed, Jul 02, 2025 at 11:23:21PM -0500, Constantine A. Murenin wrote:
>
> BSD licence is also a very permissive licence; when people compile
> this code and distribute the binaries, they aren't required to "keep
> the licence", either, so, how is this different? Besides, unlike GPL
The above s
On Wed, 2 Jul 2025 at 22:20, matthew green wrote:
>
> > Why would we NOT want to have AI train on our source code?
>
> they abuse services - DDoS sites constantly. many projects have been
> restricting access because otherwise they're not available to humans.
>
> they don't keep licenses on code.
> Why would we NOT want to have AI train on our source code?
they abuse services - DDoS sites constantly. many projects have been
restricting access because otherwise they're not available to humans.
they don't keep licenses on code.
.mrg.
On Sun, 8 Jun 2025 at 11:42, Mouse wrote:
>
> >> I can't easily check -current, because HTTP access to cvsweb has
> >> been broken; it now insists on trying to ram HTTPS down my throat.
> > Side note: it is far worse than http vs. https, it uses www/anubis
> > [...JavaScript worker threads...sha25
>> I can't easily check -current, because HTTP access to cvsweb has
>> been broken; it now insists on trying to ram HTTPS down my throat.
> Side note: it is far worse than http vs. https, it uses www/anubis
> [...JavaScript worker threads...sha256...]
> Unfortunately this kind of drastic measures
On Sun, Jun 08, 2025 at 11:07:26AM -0400, Mouse wrote:
> [..] I can't easily
> check -current, because HTTP access to cvsweb has been broken; it now
> insists on trying to ram HTTPS down my throat.
Side note: it is far worse than http vs. https, it uses www/anubis
from pkgsrc to verify you are ac
16 matches
Mail list logo