Hello!

Last month, we discussed¹ slow progress with builds (and ‘core-updates’
in particular) on ci.guix, especially on AArch64 and POWER9.  Those
turned out to be mostly due to scalability issues in Cuirass.  Likewise,
the front page at https://ci.guix.gnu.org was timing out for almost two
weeks².

I’m happy to report that the first class of problems is mostly fixed,
and timeouts are not entirely gone with they’re less frequent.  Some
details about the work done:

  • I learned a lot from Chris about all things PostgreSQL (I even learn
    that phrases like “database administrator” are a thing).  Chris
    provided invaluable suggestions to optimize SQL queries that were
    taking too long, as was the case on the front page, and to tweak
    PostgreSQL configuration.

      
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=f60e73b7b1e906349d2355d37807514c6e667f0c
      
https://git.savannah.gnu.org/cgit/guix/maintenance.git/commit/?id=d98e1f76501d368e67a8e57455195590880283f8

  • The infamous “missing derivation” issue that has been causing
    spurious build failures may be coming to an end: as suggested by
    Chris, ‘cuirass remote-worker’ now has an explicit step to
    substitute .drv store items and it keeps retrying for a while when
    that fails:

      
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=2365ba786c805477fcbae6eaeb358b0dd0501598

  • ‘cuirass remote-server’ doesn’t use the database anymore to store
    transient worker information (“last seen” time), which reduces
    pressure on the database and increases throughput:

      
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=e9f83e43f066cdc8bb4bec6ba221ade4ef7cab7b

  • A bunch of corner cases (stuck builds, etc.) are now better handled
    by restarting, rescheduling, or canceling as makes most sense.

  • We upgraded the Honeycombs (AArch64) and POWER9 build machines.  At
    this stage 3 AArch64 and 2 POWER9 build machines are fully
    operational behind ci.guix:

      https://ci.guix.gnu.org/workers

    Another Honeycomb, grunewald, is undergoing maintenance at the MDC
    and should be back soon.

Substitute available is back to 94% for x86_64 for ‘core-updates’; other
architectures are still lacking but that’ll hopefully improve over the
coming days:

  https://qa.guix.gnu.org/branch/core-updates

These things require constant attention.  If you notice anything
suspicious, feel free to bring it up here or on IRC!

Ludo’.

¹ https://lists.gnu.org/archive/html/guix-devel/2024-06/msg00149.html
² https://lists.gnu.org/archive/html/guix-devel/2024-06/msg00312.html

Reply via email to