Memory usage approximations: per tuple: ipv6 = 16 package pointer = 3 (assuming <16777216 packages) version pointer = 2 (assuming <65536 distinct version names) + some overhead => ~ 40 B seems fair? But you could also just write to disk. It'll wear out an SSD though, and random r/w on a harddrive is slow.
> Where has 100 packages come from here? That would be the average number of downloaded packages per IP per day. I assume some would just download a single package, while others are installing an entire system of +1000 packages, but 100 on average seems a fair ballpark number. What number do you suggest? On Sat, May 3, 2025 at 11:39 AM Adam D. Barratt <a...@adam-barratt.org.uk> wrote: > > On Sat, 2025-05-03 at 11:16 +0200, Erik Schulz wrote: > > I suspect that compliance with GDPR would require the data to be > > stored minimally. > > It seems reasonable to me that a 24-hour window would reduce most > > repeat-downloads. > > If you stream the request log and reduce to (ip,package,version), it > > will be minimal. > > I think it would fit into memory, e.g. 10 million unique IP adresses > > x 100 packages x 40 bytes = 40 GB > > Where has 100 packages come from here? There are 34 *thousand* source > packages in bookworm, i.e. over 100 times your quoted estimate. > > You also seem to have underestimated quite a bit if you believe that > you can fit an IPv6 address, a package name and a package version into > 40 bytes in most cases, yet alone all. > > (As an aside, the RAM allocation on the logging hosts is currently > 2GB.) > > Regards, > > Adam >