Re: Package statistics by downloads

Philipp Kern Sat, 03 May 2025 00:45:16 -0700

On 2025-05-03 03:35, Otto Kekäläinen wrote:

I'm interested in package popularity. I'm aware of popcon
(https://popcon.debian.org/), but I'm more interested in actual
downloads.


I am also interested in usage statistics. I feel it is much more
meaningful to work on packages that I know how have a lot of users.

While neither popcon of download stats are accurate, they still show
trends and relative numbers which can be used to make useful
conclusions. I would be glad to see if people could share ideas on
what stats we could collect and publish instead of just pointing out
flaws in various stats.

The problem is that we currently do not want to retain this data. It'drequire a clear measure of usefulness, not just a "it would be nice ifwe had it". And there would need to be actual criteria of what we wouldbe interested in. Raw download count? Some measure of bucketing bysource IP or not? What about container/hermetic builders fetching thesame ancient package over and over again from snapshot? Does the versionmatter?

In the end there would probably need to be a proof of concept of a logprocessor that's privacy-friendly and gives us the metrics that weactually want. Hence my question what these metrics are for, except fora fuzzy feeling of "working on the right priorities". There will be lotsof packages that are rarely downloaded and still important.

Everyone can ask "please just retain all logs and we will do analysis onthem later". Right now it'd be infeasible to get the statistics from themirrors, and we could at most get statistics for deb.d.o. To give asense of scale: We are sampling 1% of cache hits and all errors rightnow. That's 6.7 GB/d uncompressed (500 M/d compressed). Back of theenvelope math says that'd be 600 GB/d of raw syslog log traffic. Weshould have a very good reason for collecting this much data.


Kind regards
Philipp Kern

Re: Package statistics by downloads

Reply via email to