Hi! Mathieu Othacehe <othac...@gnu.org> skribis:
>> Something is going wrong here! I'll keep investigating. > > To help us investigate those issues I added a "/status" page, which is > also accessible from a new drop-down menu in the Cuirass navigation bar. > > See, https://ci.guix.gnu.org/status. Nice! So it’s roughly like the info at /api/queue, but filtered to running builds, right? > Hydra has the same interface, but also a "Machine status" page, that > breaks down the running builds machine per machine. I plan to implement > that one next. Reading Hydra code, I also discovered that some part of > the offloading is directly done from Hydra, which talks with the > nix-daemon of the connected build machines, interesting! Yes, Hydra does most of the scheduling by itself. Since this is redundant with what the daemon + offload do, I thought Cuirass shouldn’t do any scheduling at all and instead let the daemon take care of it all. This has advantages (the daemon has a global view and can achieve better scheduling), and drawbacks (the protocol requires us to wait for ‘build-things’ completion before we can queue more builds, and scheduling decisions are almost invisible to Cuirass). > While I'm writing, we have 5 running builds for ~1 hour, and 76040 queued > builds. Given the computing power of Berlin, there must be a bottleneck > somewhere. Yes! I’ve often run “guix processes” on berlin, then stracing the ‘SessionPID’ process. It’s insightful because you sometimes see the daemon is stuck waiting for a machine to offload to, sometimes it’s stuck waiting for a build that will perhaps just eventually timeout… Ludo’.