Hi, I am late to the party…
On mer., 10 avril 2024 at 15:57, Ludovic Courtès <l...@gnu.org> wrote: >> That has happened to me too. >> Why not use Git directly always? > > Because it create{s,d} a bootstrapping issue. The > “builtin:git-download” method was added only recently to guix-daemon and > cannot be assumed to be available yet: > > https://issues.guix.gnu.org/65866 [...] > I think we should gradually move to building everything from > source—i.e., fetching code from VCS and adding Autoconf & co. as inputs. > > This has been suggested several times before. The difficulty, as you > point out, will lie in addressing bootstrapping issues with core > packages: glibc, GCC, Binutils, Coreutils, etc. I’m not sure how to do > that but… [...] > … live-bootstrap can probably be a good source of inspiration to find a > way to build those core packages (or some of them) straight from a VCS > checkout. IMHO, we need to distinguish because there is different types of issues and thus different potential workarounds. :-) 1. Bootstrap how to download source code. 2. Bootstrap how to build core packages. 3. Bootstrap the driver (say guix-daemon and helpers). Well, having solutions for #1 and #3 would naturally provide a solution for #2. Although the devil is about details. ;-) About #1 ======== You cannot use the binary ’git’ in order to download the source code of Git to build the binary ’git’. Yeah, circular dependency. :-) Therefore, Git source code is pulled using another method, say from tarball, such method which also needs to be built from source, so it also needs yet another method. The usual chicken-or-the-egg problem. The current workaround is to “hide” the problem and introduce a “builtin:download” method: it’s an “opaque” binary that is hard to inspect. Roughly, the workaround had been introduced by [1] on Oct. 2016. Almost 8 years ago, so it works! :-) The argument for accepting this “opaque” method is because it is a fixed-output derivation. Other said, we know beforehand the SHA256 checksum. Thus the claim is: being “opaque” does not matter because the SH256 checksum can be computed independently and all the source code can be audited. For cutting another cycle, another “opaque” had be introduced: “builtin:git-download”. All applies similarly. Do not take me wrong with “opaque”. I mean that the method depends on the couple user-revision and daemon-revision. Other said, it is not straightforward to know when Alice and Bob are using the exact same method for downloading source code. Since it is not fully transparent, it is “opaque”. :-) Somehow we are applying to all what we need for cutting a specific circular dependency. We have some packages named ’foo-bootstrap’ that are aimed to solve some dependency problem about packages, then we do not use them for all; we just use them for cutting a circular dependency. I think a similar strategy should be applied for the fetch methods. We could have “git-fetch” relying on the initial Git method, i.e., a transparent derivation where it’s straightforward to audit all: the dependencies and the builder. And for some specific cases, we could have “git-fetch/bootstrap” relying on “builtin:git-download”. It eases to know which packages are very important to care. I think that “builtin:download” and “builtin:git-download” applied to all “url-fetch” and “git-fetch” both downgrade the complete transparency level for solving very specific bootstrapping problem. Last about #1, please note that the transparency does not come for free and has drawbacks: when running say “guix time-machine -C past.scm -- build -S”, all the dependencies for downloading would be the ones of past.scm. Other said, for downloading today the source code of a 5 years old package, say using ’hg-fetch’, we need Python and Mercurial as they were 5 years ago – when we do not expect any difference on the content with the Python and Mercurial of today. About #3 ======== That’s the very hard topic! The bootstrapping story is not fully done yet. Assuming trust for #1, the bootstrap of Guix starts with ’bootstrap-seeds’, roughly 232KiB. Take a moment, that’s impressive, :-) right? Obviously, I let aside Haskell, Ocaml@5 etc. Well, diving further. These 232K alone are not enough. It also requires helpers: tar (1.3MiB), bash (1.3MiB), mkdir (0.7MiB) and xz (0.844MiB). More, it requires two drivers: static Guile binary (14MiB) and guix-daemon. You get it: How to trust these helpers? Two approaches: (a) implement something directly in hex/assembler and/or (b) exploit the Guile binary (à la Scheme on bare metal). About guix-daemon, one solution is a daemon directly in Guile, and compatible with the very Guile binary. Or at least, a minimalist daemon with just enough features for building up to guix-daemon. Or another option is the “Extreme bootstrapping” [3] – my understanding of live-bootstrap. Somehow, remove guix-daemon from the picture and convert the derivation – the one read by guix-daemon – to a minimal Guile script that would be executed during startup. See the proof-of-concept in the branch wip-system-bootstrap [4]. Just my lengthy opinion… Or maybe some ideas for GSoC. ;-) 1: https://issues.guix.gnu.org/22774#3 2: https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-building-from-source-all-the-way-down 3: https://guix.gnu.org/en/blog/2019/reproducible-builds-summit-5th-edition 4: https://git.savannah.gnu.org/cgit/guix.git/log/?h=wip-system-bootstrap Cheers, simon