bug#42162: Recovering source tarballs

Timothy Sample Wed, 26 Aug 2020 14:13:14 -0700

Hi zimoun,

zimoun <[email protected]> writes:


> One question is how this database scales?
>
> For example, a quick back-to-envelop estimation leads to ~1.2GB metadata
> for ~14k packages and then an increase of ~700MB per year, both with the
> Ludo’s code [1].
>
> [1] <http://issues.guix.gnu.org/issue/42162#11>

It’s a good question.  A good part of the size comes from the
representation rather than the data.  Compression helps a lot here.  I
have a database of 3,912 packages.  It’s 295M uncompressed (which is a
little better than your estimation).  If I pass each file through Lzip,
it shrinks down to 60M.  That’s more like 15.5K per package, which is
almost an order of magnitude smaller than the estimation you used
(120K).  I think that makes the numbers rather pleasant, but it comes at
the expense of easy storing in Git.

> As mentioned [2], should this service be part of SWH (download cooking
> task)?  Or project side?
>
> [2] <https://forge.softwareheritage.org/T2430#47486>

It would be interesting to just have SWH absorb the project.  Since
other distros already know how to produce a “sources.json” and how to
query the SWH archive, it would mean that they benefit for free (and so
would Guix, for that matter).  I’m open to that, but right now having
the freedom to experiment is important.


-- Tim

bug#42162: Recovering source tarballs

Reply via email to