Re: Package statistics by downloads

Erik Schulz Sat, 03 May 2025 04:29:32 -0700

Memory usage approximations:
per tuple:
ipv6 = 16
package pointer = 3 (assuming <16777216 packages)
version pointer = 2 (assuming <65536 distinct version names)
+ some overhead
=> ~ 40 B seems fair?
But you could also just write to disk. It'll wear out an SSD though,
and random r/w on a harddrive is slow.


> Where has 100 packages come from here?
That would be the average number of downloaded packages per IP per
day. I assume some would just download a single package, while others
are installing an entire system of +1000 packages, but 100 on average
seems a fair ballpark number.
What number do you suggest?




On Sat, May 3, 2025 at 11:39 AM Adam D. Barratt
<a...@adam-barratt.org.uk> wrote:
>
> On Sat, 2025-05-03 at 11:16 +0200, Erik Schulz wrote:
> > I suspect that compliance with GDPR would require the data to be
> > stored minimally.
> > It seems reasonable to me that a 24-hour window would reduce most
> > repeat-downloads.
> > If you stream the request log and reduce to (ip,package,version), it
> > will be minimal.
> > I think it would fit into memory, e.g. 10 million unique IP adresses
> > x 100 packages x 40 bytes = 40 GB
>
> Where has 100 packages come from here? There are 34 *thousand* source
> packages in bookworm, i.e. over 100 times your quoted estimate.
>
> You also seem to have underestimated quite a bit if you believe that
> you can fit an IPv6 address, a package name and a package version into
> 40 bytes in most cases, yet alone all.
>
> (As an aside, the RAM allocation on the logging hosts is currently
> 2GB.)
>
> Regards,
>
> Adam
>

Re: Package statistics by downloads

Reply via email to