Re: Software Heritage fifth anniversary event

Timothy Sample Wed, 01 Dec 2021 10:04:42 -0800

Ludovic Courtès <ludovic.cour...@inria.fr> writes:

> I gave a 10–15mn talk on how Guix uses SWH, what Disarchive is, what
> the current status of the “preservation of Guix” is, and what remains
> to be done:
>
>   
> https://git.savannah.gnu.org/cgit/guix/maintenance.git/plain/talks/swh-unesco-2021/talk.20211130.pdf


Wow – great work!

> I chatted with the SWH tech team; they’re obviously very busy solving
> all sorts of scalability challenges :-) but they’re also truly
> interested in what we’re doing and in supporting our use case.  Off the
> top of my head, here are some of the topics discussed:
>
>   • ingesting past revisions: if we can give them ‘sources.json’ for
>     past revisions, they’re happy to ingest them;

This is something I can probably coax out of the Preservation of Guix
database.  That might be the cheapest way to do it.  Alternatively, when
we get “sources.json” built with Cuirass, we could tell Cuirass to build
out a sample of previous commits to get pretty good coverage.  (Side
note: eventually we could verify the coverage of the sampling approach
using the Data Service, which has a processed a very exhaustive list of
commits.)

>   • rate limit: we can find an arrangement to raise it for the purposes
>     of statistics gathering like Simon and Timothy have been doing (we
>     can discuss the details off-list);

Cool!  So far it hasn’t been a concern for me, but it would help in the
future if want to try and track down Git repositories that have gone
missing.

>   • Disarchive: they’d like to better understand the “unknowns” in the
>     PoG plots (I wasn’t sure if it was non-tar.gz tarballs or what) and
>     to work on the definitely-missing origins that show up there;

Many of the unknowns are there for me to track Disarchive progress.
It’s not really the clearest reporting, but it tracks more what Guix can
handle automatically than what we could theoretically know about.
Basically something is “known” if it can be downloaded from upstream,
and either: it’s a non-recursive Git reference; or it’s something
Disarchive can handle.  Hence, we know nothing about other version
control systems and, say, “.tar.bz2” archives.  Also, all these things
are based on heuristics.  :)  As we get closer to 100% known, we can
start analyzing everything more closely.

>     they’re not opposed to the idea of eventually hosting or maintaining
>     the Disarchive database (in fact one of the developers thought we
>     were hosting it in Git and that as such they were already archiving
>     it—maybe we could go back to Git?);

It’s a possibility, but right now I’m hopeful that the database will be
in the care of SWH directly before too long.  I’d rather wait and see at
this point.  I’m sure we could manage it, but the uncompressed size of
the Disarchive specification of a Chromium tarball is 366M.  Storing all
the XZ specifications uncompressed is over 20G.  It would be a big Git
repo!

>   • bit-for-bit archival: there’s a tension between making SWH a
>     “canonical” representation of VCS repos and making it a faithful,
>     bit-for-bit identical copy of the original, and there are different
>     opinions in the team here; our use case pretty much requires
>     bit-for-bit copies, and fortunately this is what SWH is giving us in
>     practice for Git repos, so checkout authentication (for example)
>     should work even when fetching Guix from SWH.

That’s interesting.  I’m sure most of us in the Guix camp are on team
bit-for-bit, but I’m sure we can all agree that it’s not easy to get
there.

> There were other discussions about Guix and Nix and I was pleased to see
> people were enthusiastic about functional package management and about
> our whole endeavor.
>
> Anyway I think we can take this as an opportunity to increase bandwidth
> with the SWH developers!

Good idea.  It’s nice when our efforts and experience produce something
useful to the broader free software community.  :)


-- Tim

Re: Software Heritage fifth anniversary event

Reply via email to