Re: Package statistics by downloads

2025-05-26 Thread Julien Plissonneau Duquène
Hi, I would be interested in per-package-and-version download statistics and trends as well. Le 2025-05-03 09:28, Philipp Kern a écrit : The problem is that we currently do not want to retain this data. You're absolutely right here, there is no point in retaining the raw data, it gets sta

Re: Package statistics by downloads

2025-05-03 Thread Erik Schulz
Memory usage approximations: per tuple: ipv6 = 16 package pointer = 3 (assuming <16777216 packages) version pointer = 2 (assuming <65536 distinct version names) + some overhead => ~ 40 B seems fair? But you could also just write to disk. It'll wear out an SSD though, and random r/w on a harddrive i

Re: Package statistics by downloads

2025-05-03 Thread Adam D. Barratt
On Sat, 2025-05-03 at 11:16 +0200, Erik Schulz wrote: > I suspect that compliance with GDPR would require the data to be > stored minimally. > It seems reasonable to me that a 24-hour window would reduce most > repeat-downloads. > If you stream the request log and reduce to (ip,package,version), it

Re: Package statistics by downloads

2025-05-03 Thread Peter B
On 03/05/2025 02:35, Otto Kekäläinen wrote: I am also interested in usage statistics. I feel it is much more meaningful to work on packages that I know how have a lot of users. +1 While neither popcon of download stats are accurate, they still show trends and relative numbers which can be used

Re: Package statistics by downloads

2025-05-03 Thread Erik Schulz
I suspect that compliance with GDPR would require the data to be stored minimally. It seems reasonable to me that a 24-hour window would reduce most repeat-downloads. If you stream the request log and reduce to (ip,package,version), it will be minimal. I think it would fit into memory, e.g. 10 mill

Re: Package statistics by downloads

2025-05-03 Thread Philipp Kern
On 2025-05-03 03:35, Otto Kekäläinen wrote: I'm interested in package popularity. I'm aware of popcon (https://popcon.debian.org/), but I'm more interested in actual downloads. I am also interested in usage statistics. I feel it is much more meaningful to work on packages that I know how have a

Re: Package statistics by downloads

2025-05-02 Thread Otto Kekäläinen
> I'm interested in package popularity. I'm aware of popcon > (https://popcon.debian.org/), but I'm more interested in actual > downloads. I am also interested in usage statistics. I feel it is much more meaningful to work on packages that I know how have a lot of users. While neither popcon of d

Re: Package statistics by downloads

2025-05-02 Thread Erik Schulz
> misguided popularity I would argue a more objective description is that the measurement has bias. I.e. - repeat-download bias. - external-download bias, when using mirrors. - false-download bias, when malicious actors try to manipulate the value, for example using many IPs. I agree that install

Re: Package statistics by downloads

2025-05-01 Thread Salvo Tomaselli
I presume do some misguided popularity ranking like pypi does, by counting the number of downloads. It works terribly because large organizations that actually download it many times will set up internal mirrors, so there is no chance for the value to have any meaning. Also on pypi and similar

Re: Package statistics by downloads

2025-04-23 Thread Philipp Kern
On 2025-04-23 10:08, Erik Schulz wrote: I'm interested in package popularity. I'm aware of popcon (https://popcon.debian.org/), but I'm more interested in actual downloads. What would this be useful for? You only described technical details, not why we would want to do this. Kind regards Phi