Hi! zimoun <zimon.touto...@gmail.com> skribis:
> On Mon, 20 Jul 2020 at 10:39, Ludovic Courtès <l...@gnu.org> wrote: >> zimoun <zimon.touto...@gmail.com> skribis: >> > On Sat, 11 Jul 2020 at 17:50, Ludovic Courtès <l...@gnu.org> wrote: > >> There are many many comments in your message, so I took the liberty to >> reply only to the essence of it. :-) > > Many comments because many open topics. ;-) Understood, and they’re very valuable but (1) I choose not to just do email :-), and (2) I like to separate issues in reasonable chunks rather than long threads addressing all the problems we’ll have to deal with. I think it really helps keep things tractable! >> Lookup issue. :-) The hash in a CID is not just a raw blob hash. >> Files are typically chunked beforehand, assembled as a Merkle tree, and >> the CID is roughly the hash to the tree root. So it would seem we can’t >> use IPFS as-is for tarballs. > > Using the Git-repo map/table, then it becomes an option, right? > Well, SWH would be a backend and IPFS could be another one. Or any > "cloudy" storage system that could appear in the future, right? Sure, why not. >> >> • If we no longer deal with tarballs but upstreams keep signing >> >> tarballs (not raw directory hashes), how can we authenticate our >> >> code after the fact? >> > >> > Does Guix automatically authenticate code using signed tarballs? >> >> Not automatically; packagers are supposed to authenticate code when they >> add a package (‘guix refresh -u’ does that automatically). > > So I miss the point of having this authentication information in the > future where upstream has disappeared. What I meant above, is that often, what we have is things like detached signatures of raw tarballs, or documents referring to a tarball hash: https://sympa.inria.fr/sympa/arc/swh-devel/2016-07/msg00009.html >> But today, we store tarball hashes, not directory hashes. > > We store what "guix hash" returns. ;-) > So it is easy to migrate from tarball hashes to whatever else. :-) True, but that other thing, as it stands, would be a nar hash (like for ‘git-fetch’), not a Git-tree hash (what SWH uses). > I mean, it is "(sha256 (base32" and it is easy to have also > "(sha256-tree (base32" or something like that. Right, but that first and foremost requires daemon support. It’s doable, but migration would have to take a long time, since this is touching core parts of the “protocol”. > I have not done yet the clear back-to-envelop computations. Roughly, > there are ~23 commits on average per day updating packages, so say 70% > of them are url-fetch, it is ~16 new tarballs per day, on average. > How the model using a Git-repo will scale? Because, naively the > output of "disassemble-archive" in full text (pretty-print format) for > the hello-2.10.tar is 120KB and so 16*365*120K = ~700Mb per year > without considering all the Git internals. Obviously, it depends on > the number of files and I do not know if hello is a representative > example. Interesting, thanks for making that calculation! We could make the format more compact if needed. Thanks, Ludo’.