Hello! Last month, we discussed¹ slow progress with builds (and ‘core-updates’ in particular) on ci.guix, especially on AArch64 and POWER9. Those turned out to be mostly due to scalability issues in Cuirass. Likewise, the front page at https://ci.guix.gnu.org was timing out for almost two weeks².
I’m happy to report that the first class of problems is mostly fixed, and timeouts are not entirely gone with they’re less frequent. Some details about the work done: • I learned a lot from Chris about all things PostgreSQL (I even learn that phrases like “database administrator” are a thing). Chris provided invaluable suggestions to optimize SQL queries that were taking too long, as was the case on the front page, and to tweak PostgreSQL configuration. https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=f60e73b7b1e906349d2355d37807514c6e667f0c https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=d98e1f76501d368e67a8e57455195590880283f8 • The infamous “missing derivation” issue that has been causing spurious build failures may be coming to an end: as suggested by Chris, ‘cuirass remote-worker’ now has an explicit step to substitute .drv store items and it keeps retrying for a while when that fails: https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=2365ba786c805477fcbae6eaeb358b0dd0501598 • ‘cuirass remote-server’ doesn’t use the database anymore to store transient worker information (“last seen” time), which reduces pressure on the database and increases throughput: https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=e9f83e43f066cdc8bb4bec6ba221ade4ef7cab7b • A bunch of corner cases (stuck builds, etc.) are now better handled by restarting, rescheduling, or canceling as makes most sense. • We upgraded the Honeycombs (AArch64) and POWER9 build machines. At this stage 3 AArch64 and 2 POWER9 build machines are fully operational behind ci.guix: https://ci.guix.gnu.org/workers Another Honeycomb, grunewald, is undergoing maintenance at the MDC and should be back soon. Substitute available is back to 94% for x86_64 for ‘core-updates’; other architectures are still lacking but that’ll hopefully improve over the coming days: https://qa.guix.gnu.org/branch/core-updates These things require constant attention. If you notice anything suspicious, feel free to bring it up here or on IRC! Ludo’. ¹ https://lists.gnu.org/archive/html/guix-devel/2024-06/msg00149.html ² https://lists.gnu.org/archive/html/guix-devel/2024-06/msg00312.html