> misguided popularity

I would argue a more objective description is that the measurement has bias.
I.e.
- repeat-download bias.
- external-download bias, when using mirrors.
- false-download bias, when malicious actors try to manipulate the
value, for example using many IPs.

I agree that installation-counting popcon avoids the first two, but
also suffers from 'willing-participant' bias. I have no idea how
severe this bias is.
So I have to theorize: maybe server installations are heavily
underrepresented; it also doesn't count the privacy conscious, and
maybe basic users that don't understand what it does and just say no.
I.e. entire classes are missing.

So the download-counting popcon may at least provide some new insights.

> to just download the files many times to increase the popularity

I assume this attack applies to popcon as well? It would be trivial to
push false numbers. I'm not familiar with how it works, but if it just
pushes a list of installed packages, then it is even more trivial to
manipulate the numbers.

The unfortunate conclusion is that we can't rely on these numbers to
track actual popularity, only whether a package is likely being used.
I.e. very low numbers may be given lower priority on mirrors, which
can be relevant for long term archives (e.g. all packages ever used in
Debian for 20 years) that may prefer to only archive packages that are
more than 0.001% likely to be used.
If servers grossly underrepresented in the sample, the data may be
unreliable for this use case.
And download-counting popcon would at-worst include unused packages.
In the worst-case, someone fake-downloading every single package would
render this statistic very hard to use.


On Fri, May 2, 2025 at 1:28 AM Salvo Tomaselli <tipos...@tiscali.it> wrote:
>
> I presume do some misguided popularity ranking like pypi does, by counting the
> number of downloads.
>
> It works terribly because large organizations that actually download it many
> times will set up internal mirrors, so there is no chance for the value to
> have any meaning.
>
> Also on pypi and similar there's an incentive to just download the files many
> times to increase the popularity (I provide a very nice tool to do that
> without consuming too much bandwidth, on my codeberg).
>
> Plus of course, how would we even aggregate all the download counts from all
> the mirrors?
>
> Best
>
>
> --
> Salvo Tomaselli
>
> "Io non mi sento obbligato a credere che lo stesso Dio che ci ha dotato di
> senso, ragione ed intelletto intendesse che noi ne facessimo a meno."
>                 -- Galileo Galilei
>
> https://ltworf.codeberg.page/
>
>

Reply via email to