Re: Disarchive update

Timothy Sample Wed, 13 Oct 2021 07:55:08 -0700

Hi Ludovic,

Ludovic Courtès <ludovic.cour...@inria.fr> writes:


> This job is disassembling all the .tar.gz files packages refer to, using
> the recently-added ‘etc/disarchive-manifest.scm’ file:
>
>   https://ci.guix.gnu.org/jobset/disarchive
>
> It has just succeeded for the first time.  :-)

Fantastic!  I feel bad that I left you holding the bag on this one,
though.  Sorry.  I’ve been a little adrift this summer.  Thanks for
picking it up!

> Where to go from here?  Timothy Sample had already set up a Disarchive
> database at <https://disarchive.ngyro.com>, which (guix download) uses
> as a fallback; I’m not sure exactly how it’s populated.

Basically the same as what you are doing now.  I have many Cuirass jobs,
and I use the build outputs mechanism (mentioned by Mathieu in elsewhere
in this thread).  I don’t have a “disarchive-collection” job, so I have
to use the Cuirass API to dig through the recent build outputs to find
new results.  This happens from a cron job, which uploads each new
result to my server.

One simple but satisfying thing that I do is serve the files compressed.
That is, they are compressed on disk and nginx just passes them along
(using the “gzip_static” module).  Because of Disarchive’s verbose and
repetitive output format, this makes for a huge reduction in storage
requirements.

> The goal here would be for the Guix project to set up infrastructure
> populating a database automatically and creating backups, possibly via
> SWH (we’ll have to discuss it with them).
>
> A plan we can already deploy would be:
>
>   1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.
>
>   2. On berlin, add an mcron job that periodically copies the output of
>      the latest “disarchive-collection” build to a directory, say
>      /srv/disarchive.  Thus, the database would accumulate tarball
>      metadata over time.
>
>   3. Add an nginx route so that /srv/disarchive is served at
>      https://disarchive.guix.gnu.org.
>
>   4. Add disarchive.guix.gnu.org to (guix download).
>
> How does that sound?  Thoughts?

This is great!  I can offer some past metadata, too.  Specifically, I
have ~14000 files that I generated while digging into SWH coverage.
(That’s a project I’d like to return to, but I’m still trying to get my
head back in the game and pick up where I left off.)


-- Tim

Re: Disarchive update

Reply via email to