> misguided popularity I would argue a more objective description is that the measurement has bias. I.e. - repeat-download bias. - external-download bias, when using mirrors. - false-download bias, when malicious actors try to manipulate the value, for example using many IPs.
I agree that installation-counting popcon avoids the first two, but also suffers from 'willing-participant' bias. I have no idea how severe this bias is. So I have to theorize: maybe server installations are heavily underrepresented; it also doesn't count the privacy conscious, and maybe basic users that don't understand what it does and just say no. I.e. entire classes are missing. So the download-counting popcon may at least provide some new insights. > to just download the files many times to increase the popularity I assume this attack applies to popcon as well? It would be trivial to push false numbers. I'm not familiar with how it works, but if it just pushes a list of installed packages, then it is even more trivial to manipulate the numbers. The unfortunate conclusion is that we can't rely on these numbers to track actual popularity, only whether a package is likely being used. I.e. very low numbers may be given lower priority on mirrors, which can be relevant for long term archives (e.g. all packages ever used in Debian for 20 years) that may prefer to only archive packages that are more than 0.001% likely to be used. If servers grossly underrepresented in the sample, the data may be unreliable for this use case. And download-counting popcon would at-worst include unused packages. In the worst-case, someone fake-downloading every single package would render this statistic very hard to use. On Fri, May 2, 2025 at 1:28 AM Salvo Tomaselli <tipos...@tiscali.it> wrote: > > I presume do some misguided popularity ranking like pypi does, by counting the > number of downloads. > > It works terribly because large organizations that actually download it many > times will set up internal mirrors, so there is no chance for the value to > have any meaning. > > Also on pypi and similar there's an incentive to just download the files many > times to increase the popularity (I provide a very nice tool to do that > without consuming too much bandwidth, on my codeberg). > > Plus of course, how would we even aggregate all the download counts from all > the mirrors? > > Best > > > -- > Salvo Tomaselli > > "Io non mi sento obbligato a credere che lo stesso Dio che ci ha dotato di > senso, ragione ed intelletto intendesse che noi ne facessimo a meno." > -- Galileo Galilei > > https://ltworf.codeberg.page/ > >