Hello all, recently I have had a discussion with people from Nix how they handle their package upgrades. Probably I misunderstood a few things, but it might be interesting nevertheless to get some inspiration.
Apparently they do not work with feature or team branches, as we had decided to do during Guix Days 2024, but instead all commits go to a separate staging branch, which is continually built. If this works out well, then the branch is pushed to master. If not, the commits are dropped, and the people who submitted them need to get back to square zero and submit repaired patches for the next round. I am not exactly sure how "continuous" integration works; I suppose it is more "discrete", in the sense that the branch is built through once, and after that they jump to the next iteration. (Then this would be different from what we do in CI or QA, where a newer commit supersedes the previous one, and we always jump to the newest commit.) So this is a hit or miss approach, which I find quite interesting; if something does not work, it does not hold up other work for an indefinite amount of time until the problem gets repaired, but it is dropped for the time being. This also assumes that we accept errors and failures: not everything can be checked locally, so there is no shame in a commit not being accepted or being reverted in a round. My general impression is that in the end, the "throughput" is better than what we have in Guix now. As I understood it, there is one (!) person responsible for accepting or rejecting the packages into master, in some kind of BDFL role... To me this looks like too much work for a single person, and as often something that should be rotated with a group of people. But clearly, as for the current branch merging process, we need to establish some kind of authority: everyone loves their own patches updating or repairing their favourite packages, so someone else needs to take the unpleasant decision to drop the commits. I got the impression that more concerted effort is deployed for big changes (like the new version of gcc we have in core-packages-team, or similar ones), but I do not know how this goes together with the general staging branch approach. Maybe something we could learn from this (Hegel, here comes the synthesis!) is how to handle changes that require many rebuilds, but (so far at least) are not handled by a team: bc, perl, and so on. So far these tended to linger indefinitely in the issue tracker. One approach could be to create a staging branch as described above, regrouping a fixed (see below) number of pull requests. The problem here is to make sure it does not end up like the former core-updates branch, where people unsupervisedly dumped their world rebuild changes, so that after a while nobody has an idea what is on the branch, and the commits and follow-up commits repairing breakage are so entangled that one can only go forward adding more commits and not go back. So my suggestion would be to take a fixed set of commits and to apply them in a branch, which from that point on is not touched anymore; after building it on CI or QA, either the complete branch is pushed to master, or it is dropped, and a new trial is made. Repairing things on the branch would be forbidden. Maybe the staging branch shepherds could make a second trial with a subset of the previous commits, if they manage to determine which of them caused problems. Codeberg could help us here, since we could create a milestone containing the pull requests on the branch, so we have a good idea of what is there and what is not. And the pull requests would be closed once they land in master (and not once they land in the branch, what we currently tend to do, since supposedly whatever is in a branch will end up in master in some form or another). And such a branch could be interleaved with our current team branches. This should not be confounded with a merge train. If I understand it correctly, a merge train of n commits consists in letting the ci system run in parallel all 2^n-1 combinations of including a commit or not; then out of the succeeding runs, it chooses one with a maximal set of commits. Given how long it takes to run one evaluation of Guix and build all its packages, this is not feasible. Whatever we do, we need more reliable tooling and ci; if all goes well, I estimate that we can build out Guix in a few days, maybe three? But for CI as well as QA we sometimes have outages of several days where they are stuck on some bug (of the software, or the hardware, or tricky combinations of both, like garbage collection grinding everything else to a halt). Difficult to say how this can be improved, probably by having more people look after things, improving bayfront hardware as described elsewhere: https://codeberg.org/guix/maintenance/issues/13 https://codeberg.org/guix/maintenance/issues/14 and so on. What do you think? Andreas