Re: Package statistics by downloads

2025-05-03 Thread Erik Schulz
mber. What number do you suggest? On Sat, May 3, 2025 at 11:39 AM Adam D. Barratt wrote: > > On Sat, 2025-05-03 at 11:16 +0200, Erik Schulz wrote: > > I suspect that compliance with GDPR would require the data to be > > stored minimally. > > It seems reasonable to m

Re: Package statistics by downloads

2025-05-03 Thread Erik Schulz
I suspect that compliance with GDPR would require the data to be stored minimally. It seems reasonable to me that a 24-hour window would reduce most repeat-downloads. If you stream the request log and reduce to (ip,package,version), it will be minimal. I think it would fit into memory, e.g. 10 mill

Re: Package statistics by downloads

2025-05-02 Thread Erik Schulz
> misguided popularity I would argue a more objective description is that the measurement has bias. I.e. - repeat-download bias. - external-download bias, when using mirrors. - false-download bias, when malicious actors try to manipulate the value, for example using many IPs. I agree that install

Package statistics by downloads

2025-04-23 Thread Erik Schulz
I'm interested in package popularity. I'm aware of popcon (https://popcon.debian.org/), but I'm more interested in actual downloads. Do the debian mirrors track unique downloads (e.g. by hashed IP address), and if no, why not? I can understand the privacy argument, but arguably package downloads a