Hello all,

recently I have had a discussion with people from Nix how they handle
their package upgrades. Probably I misunderstood a few things, but it
might be interesting nevertheless to get some inspiration.

Apparently they do not work with feature or team branches, as we had
decided to do during Guix Days 2024, but instead all commits go to a
separate staging branch, which is continually built. If this works out
well, then the branch is pushed to master. If not, the commits are dropped,
and the people who submitted them need to get back to square zero and
submit repaired patches for the next round. I am not exactly sure how
"continuous" integration works; I suppose it is more "discrete", in the
sense that the branch is built through once, and after that they jump to
the next iteration. (Then this would be different from what we do in CI
or QA, where a newer commit supersedes the previous one, and we always
jump to the newest commit.)

So this is a hit or miss approach, which I find quite interesting;
if something does not work, it does not hold up other work for an
indefinite amount of time until the problem gets repaired, but it is
dropped for the time being. This also assumes that we accept errors and
failures: not everything can be checked locally, so there is no shame in
a commit not being accepted or being reverted in a round.
My general impression is that in the end, the "throughput" is better
than what we have in Guix now.

As I understood it, there is one (!) person responsible for accepting or
rejecting the packages into master, in some kind of BDFL role... To me
this looks like too much work for a single person, and as often
something that should be rotated with a group of people. But clearly,
as for the current branch merging process, we need to establish some
kind of authority: everyone loves their own patches updating or
repairing their favourite packages, so someone else needs to take the
unpleasant decision to drop the commits.

I got the impression that more concerted effort is deployed for big
changes (like the new version of gcc we have in core-packages-team,
or similar ones), but I do not know how this goes together with the
general staging branch approach.

Maybe something we could learn from this (Hegel, here comes the
synthesis!) is how to handle changes that require many rebuilds, but
(so far at least) are not handled by a team: bc, perl, and so on.
So far these tended to linger indefinitely in the issue tracker.
One approach could be to create a staging branch as described above,
regrouping a fixed (see below) number of pull requests. The problem here
is to make sure it does not end up like the former core-updates branch,
where people unsupervisedly dumped their world rebuild changes, so that
after a while nobody has an idea what is on the branch, and the commits
and follow-up commits repairing breakage are so entangled that one can
only go forward adding more commits and not go back.
So my suggestion would be to take a fixed set of commits and to apply
them in a branch, which from that point on is not touched anymore; after
building it on CI or QA, either the complete branch is pushed to master,
or it is dropped, and a new trial is made. Repairing things on the
branch would be forbidden. Maybe the staging branch shepherds could make
a second trial with a subset of the previous commits, if they manage to
determine which of them caused problems. Codeberg could help us here,
since we could create a milestone containing the pull requests on the
branch, so we have a good idea of what is there and what is not. And the
pull requests would be closed once they land in master (and not once
they land in the branch, what we currently tend to do, since supposedly
whatever is in a branch will end up in master in some form or another).
And such a branch could be interleaved with our current team branches.

This should not be confounded with a merge train. If I understand it
correctly, a merge train of n commits consists in letting the ci system
run in parallel all 2^n-1 combinations of including a commit or not;
then out of the succeeding runs, it chooses one with a maximal set of
commits. Given how long it takes to run one evaluation of Guix and build
all its packages, this is not feasible.

Whatever we do, we need more reliable tooling and ci; if all goes well,
I estimate that we can build out Guix in a few days, maybe three?
But for CI as well as QA we sometimes have outages of several days where
they are stuck on some bug (of the software, or the hardware, or tricky
combinations of both, like garbage collection grinding everything else
to a halt). Difficult to say how this can be improved, probably by
having more people look after things, improving bayfront hardware as
described elsewhere:
   https://codeberg.org/guix/maintenance/issues/13
   https://codeberg.org/guix/maintenance/issues/14
and so on.

What do you think?

Andreas


Reply via email to