Hello all, the title says it all, I wish to share some conclusions from working on the core-updates merge. Clearly our tooling could be improved for the task; there was some flying by night without instruments, and in the end I merged the branch without being really able to tell how it compared to master... (You may also blame it partially on my lack of patience.) Having feature branches may or may not make things a bit easier, but it will definitely not solve the problems. This mail is also of course a bit politically sensitive: It may look like I am complaining about other people's work, who are volunteers and do what they can, without offering to work on the code myself. So as a preamble, let me express my gratitude to the few people who have been working tirelessly on our tooling and contributing to our infrastructure, without whom big code changes like we did on core-updates (and now on feature branches) would simply be impossible; their work is vital to the project and often not very visible. If I am critical, it is not to diminish their work, but to discuss about a positive path forward; and I hope more people will find the motivation to do infrastructure work, which I think will be decisive for the success of Guix (together with policy and organisational questions).
We have two build farms, berlin and bordeaux (which is a good thing for checking reproducibility and for redundancy, but maybe a bit of a problem concerning hardware requirements for "exotic" architectures), running two different CI projects, cuirass and the Guix build coordinator (gbc in the following); both have a very low bus factor (1 to 2?), and it would be nice to get more people onboard. For this, more documentation would be helpful. Both have pros and cons, and are architectured quite differently, so I do not know whether convergence is achievable. I ended up relying mostly on cuirass for reasons I do not completely remember any more. The dashboard with its green and red dots is a very useful tool compared to lists of builds, which become unusable with over 20000 packages. The bigger build power on bordeaux is helpful, and I found the web interface of gbc a bit slow and down a bit too often. With this experience, I just filed three wishlist bugs for cuirass: - Topological sorting in cuirass https://issues.guix.gnu.org/63412 The lack of ordering the builds is a big problem wasting a lot of build power; it is solved in gbc and, I think, the reason why the bordeaux build farm fares better for aarch64 with fewer machines. I would tag this as "important". - Evaluation comparison on cuirass https://issues.guix.gnu.org/63414 Without being able to compare a branch to master, it is difficult to decide whether one should merge. This is sort of solved in gbc, but so far the bordeaux build farm has been used more for QA of single patches (or a short list of patches featuring in a single issue) than for building complete branches. - Stop and restart builds in cuirass https://issues.guix.gnu.org/63413 Manual intervention is not easy in cuirass (I spent hours clicking on "restart" or using the REST API with a shell script through wget, which resulted in my IP being banned as a DoS suspect...); and to my knowledge, there is no web interface for doing so in gbc. In both systems one can probably tinker with the underlying databases, but this also does not qualify as "easy". gdb just got a very nice feature on "blocking builds": https://data.guix.gnu.org/revision/8f92dfd9ae7ac491ab7fb4b425799a8c909708a8/blocking-builds?system=aarch64-linux&target=none&limit_results=50 As I understand them, these are the "first failures", derivations all inputs of which are available, but which fail themselves; so they give the place where work is needed (and repairs will immediately make a difference). Once the topological sorting in cuirass is sorted out, these should be the builds marked as "Failed" (as opposed to "Failed (dependency)"), so with the first issue above handled, they could easily be shown by cuirass as well. This was a long message to say "I filed three bugs", but maybe it can be the starting point to discuss more items on how to go forward with our build and CI infrastructure. Andreas