On Sun, 27 Dec 2020 at 17:52, Matthew Miller <mat...@fedoraproject.org>
wrote:

> On Sun, Dec 27, 2020 at 07:44:57PM +0100, clime wrote:
> > I think we can simply parse server-side access logs to count package
> > downloads, no?
>
> We can for our primary server, but most people get updates from mirrors
>
which we don't run directly. The central mirrorlist (from which I get the
> dnf count data) just redirects people to those mirrors. Even if we could
> get
> package download counts from the mirrors, they're heavily skewed by:
>
> * public mirrors pulling the whole thing
> * people pulling the whole thing for a private mirror
> * ci and build systems (like, running mock)
> * mysterious bots downloading stuff for whatever reason
> * proxies and caching
>
>
There are a couple of other items which make it hard to see and impossible
for even our primary servers to be useful. When you look at the logs, there
is nothing that indicates whether a package is being installed, updated, or
pulled in as a dependency. This means that any stats will show which
packages get updated the most during a release or have a lot of
sub-packages which might get pulled in.

The mirroring effect also has a noise problem where a client  got some of
his packages from one mirror and then got mostly dependencies from a
secondary mirror.

Finally CI and build systems swamp all other downloads from mirrors these
days. Depending on how they are setup some seem to do a ```yum install *```
before operating. My guess is that at least 60% of all traffic is CI these
days. (I expect that this also the case for a lot of other distributions
also).

Packages with lots of updates sounds like they might be getting more
interest but you have a lot of upstreams who do 2 week sprint releases
which mean there are lots of regular updates.

All in all, what you get by looking at a mirrors data is a 'reverse
popularity contest'. Packages like the kernel, glibc, firefox, and every
dependency which gets an update sits on top. Packages at the bottom may be
the ones being asked for but they are also dependencies which aren't pulled
in a lot or don't see an update.

In the end I think popcorn might be better BUT they are also hard to setup
in these days of trolls and GDPR. [Heck smolt had almost more trolls in it
than regular data by the end of it.. so many people set up PDP-11 and VAX
as their hardware running Fedora.]



and probably more. Popcon and smolt are better because it's actual
> individual system data. On the other than, they're worse as mentioned
> because opt-in doesn't give a realistic picture.
>
>
> --
> Matthew Miller
> <mat...@fedoraproject.org>
> Fedora Project Leader
> _______________________________________________
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
>


-- 
Stephen J Smoogen.
_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

Reply via email to