Hi, On jeu., 16 mars 2023 at 18:45, Ludovic Courtès <l...@gnu.org> wrote:
>> For sure, we have to fix the holes and bugs. :-) However, I am asking >> what we could add for having more robustness on the long term. > Sources (fixed-output derivations) are already content-addressed, by > definition (I prefer “content addressing” over “intrinsic > identification” because that’s a more widely recognized term). This is the case when you consider that the result of the fixed-output derivation is already inside the Guix “ecosystem”… > In a way, like Maxime way saying, the URL/URI is just a hint; what > matters it the content hash that appears in the origin. …but else URL/URI is not just a “hint“. Or could you explain what you mean by a “hint”? Maybe I misunderstand something, from my understanding, URL/URI is a “hint” only when substitutes is available, else Guix relies on plain URL/URI for fetching data. --8<---------------cut here---------------start------------->8--- $ guix build hello -S --no-substitutes --check The following derivation will be built: /gnu/store/3hxraqxb0zklq065zjrxcs199ynmvicy-hello-2.12.1.tar.gz.drv building /gnu/store/3hxraqxb0zklq065zjrxcs199ynmvicy-hello-2.12.1.tar.gz.drv... Starting download of /gnu/store/1s6xba6nafkxb242kafkg3x10jkdn2n9-hello-2.12.1.tar.gz >From https://ftpmirror.gnu.org/gnu/hello/hello-2.12.1.tar.gz... following redirection to `https://mirror.cyberbits.eu/gnu/hello/hello-2.12.1.tar.gz'... downloading from https://ftpmirror.gnu.org/gnu/hello/hello-2.12.1.tar.gz ... warning: rewriting hashes in `/gnu/store/3dq55rw99wdc4g4wblz7xikc8a2jy7a3-hello-2.12.1.tar.gz'; cross fingers --8<---------------cut here---------------end--------------->8--- Other said, when speaking about robustness (broad meaning), I think we cannot assume that the “content addressing” provided by the derivation, --8<---------------cut here---------------start------------->8--- Derive ([("out","/gnu/store/3dq55rw99wdc4g4wblz7xikc8a2jy7a3-hello-2.12.1.tar.gz","sha256","8d99142afd92576f30b0cd7cb42a8dc6809998bc5d607d88761f512e26c7db20")] ,[] ,["/gnu/store/0mxnx8l4fgigvd7gakwdk6hc6im4wnai-disarchive-mirrors","/gnu/store/ckxc05iflc8jagdxwh4z1cxc23mb6i6q-mirrors","/gnu/store/wg1yp2vx8gb7qmcgyibqnwblahpp4bjg-content-addressed-mirrors"] ,"x86_64-linux","builtin:download",[] ,[("content-addressed-mirrors","/gnu/store/wg1yp2vx8gb7qmcgyibqnwblahpp4bjg-content-addressed-mirrors") ,("disarchive-mirrors","/gnu/store/0mxnx8l4fgigvd7gakwdk6hc6im4wnai-disarchive-mirrors") ,("impureEnvVars","http_proxy https_proxy LC_ALL LC_MESSAGES LANG COLUMNS") ,("mirrors","/gnu/store/ckxc05iflc8jagdxwh4z1cxc23mb6i6q-mirrors") ,("out","/gnu/store/3dq55rw99wdc4g4wblz7xikc8a2jy7a3-hello-2.12.1.tar.gz") ,("preferLocalBuild","1") ,("url","\"mirror://gnu/hello/hello-2.12.1.tar.gz\"")]) --8<---------------cut here---------------end--------------->8--- is still there and instead it would mean Guix has to rely on another system (here ’url’). Somehow, I am proposing to optionally add more “content addressing” than the current NAR+SHA256 (and URL/URI) to then be able to exploit other “content addressing“ systems. > So it seems to me that the basics are already in place. Well, there is two possible choices: (1) rely on an external service that would be bridge the different content addressing systems (as extending the Disarchive database or hope SWH will do it :-)) but this other external service needs to be always available or (2) extend the information of packages (optional fields, etc.). Moreover about (1), all third-party channels would have to be ingested by this external service. About SWH, that’s possible. About Disarchive database, it would mean register this third-party channel or maintain their own database. Contrary to (2) where the identifier would be optionally part of the package definition. > What’s missing, both in SWH and in Guix, is the ability to store > multiple hashes. SWH could certainly store several hashes, computed > using different serialization and hash algorithm combinations. Please note that currently Guix relies on a “hint“ when SWH is used as fallback. For instance, consider most of the cases of git-fetch, Guix provides to the SWH API the context (URL and Git tag) and let SWH resolves in order to find the content addressing identifier. It works for many cases but it fails for history of history cases, e.g., when upstream does in-place tag replacement. And this strategy does not work with Subversion (svn-fetch) or Mercurial (hg-fetch) or else. It requires more work on our side (parse the result of the query, extract relevant information etc.). Nothing impossible but far to be done, IMHO. :-) Well, I still have mixed feelings about the SWH fallback robustness. :-) > This is what you suggested at > <https://gitlab.softwareheritage.org/swh/meta/-/issues/4538>; it was > also discussed in the thread at > <https://sympa.inria.fr/sympa/arc/swh-devel/2016-07/msg00019.html>. It > would be awesome if SWH would store Nar hashes; that would solve all our > problems, as you explained. Yeah that’s nice. :-) The progress is tracked by, https://gitlab.softwareheritage.org/swh/meta/-/issues/4979 and the first part for computing NAR is now merged, IIUC, with: https://gitlab.softwareheritage.org/swh/devel/swh-loader-core/-/merge_requests/459 However, exposing via their API this NAR and then bridging NAR -> swhid is not planned on SWH side yet, AFAIK. > The other option—storing multiple hashes for each origin in Guix—doesn’t > sound practical: I can’t imagine packages storing and updating more than > one content hash per package. That doesn’t sound reasonable. Plus it > would be a long-term solution and wouldn’t help today. Storing a list of content addressing identifiers (NAR+SHA256, Git+SHA1, GNUnet, IPFS, etc.) would allow to add robustness, IMHO. Other said, it is not affordable to have a ’gnunet-fetch’ method as proposed in [1] but we could optionally have, (origin (method url-fetch) (uri (string-append "mirror://gnu/hello/hello-" version ".tar.gz")) (sha256 (base32 "086vqwk2wl8zfs47sq2xpjc9k066ilmb8z6dn0q6ymwjzlm196cd")) (identifiers (list (gnunet "Y48PGS5RVX643NT2B7GDNFCBT4DWG692PF4YNHERR96K6MSFRZ4ZWRPQ4KVKZV29MGRZTWAMY9ETTST4B6VFM47JR2JS5PWBTPVXB0.8A9HRYABJ7HDA7B0") (git+sha1 "swh:1:dir:013573086777370b558b1a9ecb6d0dca9bb8ea18") (none+sha1 "8f261739d33d31867ab9c5fa26f973c37da26ca5")))) And we could also have Git commit hash (for packages using git-fetch method), etc. Having an optional field ’identifiers’ would allow to help today for all other fetch methods than url-fetch and git-fetch. For sure, it is not straightforward. For instance, how to insure the consistency? Via “guix lint”? Else? Well, on the other hand, sometimes I would like to have a list of sources using different fetch method, say try first using this url-fetch and then this git-fetch and then this SWH fallback, etc. To me the other viable option would be to extend the Disarchive database and services around. Thought? Cheers, simon 1: https://issues.guix.gnu.org/44199#0-lineno68