Hello, Christopher Baines <m...@cbaines.net> skribis:
> I've been doing some performance tuning, submitting builds is now more > parallelised, a source of slowness when fetching builds has been > addressed, and one of the long queries involved in allocating builds has > been removed, which also improved handling of the WAL (Sqlite write > ahead log). > > There's also a few new features. Agents can be deactivated which means > they won't get any builds allocated. The coordinator now checks the > hashes of outputs which are submitted, a safeguard which I added because > the coordinator now also supports resuming the uploads of outputs. This > is particularly important when trying to upload large (> 1GiB) outputs > over slow connections. > > I also added a new x86_64 build machine. It's a 4 core Intel NUC that I > had sitting around, but I cleaned it up and got it building things. This > was particularly useful as I was able to use it to retry building > guile@3.0.7, which is extremely hard to build [2]. This was blocking > building the channel instance derivations for x86_64-linux. > > 2: > https://data.guix.gnu.org/gnu/store/7k6s13bzbz5fd72ha1gx9rf6rrywhxzz-guile-3.0.7.drv Neat! (Though I wouldn’t say building Guile is “extremely hard”, especially on x86_64. :-)) The ability to keep retrying is much welcome. > On the related subject of data.guix.gnu.org (which is the source of > derivations for bordeaux.guix.gnu.org, as well as a recipient of build > information), there have been a couple of changes. There was some web > crawler activity that was slowing data.guix.gnu.org down significantly, > NGinx now has some rate limiting configuration to prevent crawlers > abusing the service. The other change is that substitutes for the latest > processed revision of master will be queried on a regular basis, so this > page [3] should be roughly up to date, including for ci.guix.gnu.org. > > 3: > https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-substitute-availability That’s good news. That also means that things like <https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility> should be more up-to-date, which is really cool! This can have a drastic impact in how we monitor and address reproducibility issues. > Now for some not so good things: > > Submitting builds wasn't working quite right for around a month, one of > the changes I made to speed things up led to some builds being > missed. This is now fixed, and all the missed builds have been > submitted, but this was more than 50,000 builds. This, along with all > the channel instance derivation builds that can now proceed mean that > there's a very large backlog of x86 and ARM builds which will probably > take at least another week to clear. While this backlog exists, > substitute availability for x86_64-linux will be lower than usual. At least it’s nice to have a clear picture of which builds are missing, how much of a backlog we have, and what needs to be rebuilt. > Space is running out on bayfront, the machine that runs the coordinator, > stores all the nars and build logs, and serves the substitutes. I knew > this was probably going to be an issue, bayfront didn't have much space > to begin with, but I had hoped I'd be further forward in developing some > way to allow moving the nars around between multiple machines, to remove > the need to store all of them on bayfront. I have got a plan, there's > some ideas I mentioned back in February [4], but I haven't got around to > implementing anything yet. The disk space usage trend is pretty much > linear, so if things continue without any change, I think it will be > necessary to pause the agents within a month, to avoid filling up > bayfront entirely. Ah, bummer. I hope we can find a solution one way or another. Certainly we could replicate nars on another machine with more disk, possibly buying the necessary hardware with the project funds. Thanks for the update! Ludo’.