Hey,

It's been a long while since I sent out an update email, regarding
automated testing for patches. Mid last year I did send out an email
about starting a Quality Assurance Meeting/Team/Sociocracy circle [1]
and late last year I sent out an email about data.qa.guix.gnu.org moving
server.

1: https://lists.gnu.org/archive/html/guix-devel/2024-06/msg00217.html
2: https://lists.gnu.org/archive/html/guix-devel/2024-12/msg00249.html

My plan for last year was to spend much more time working on the Guile
guix-daemon, that didn't happen though and instead I ended up trying to
keep QA and the surrounding services running and maybe even sometimes
working.

Last year at least it didn't feel like much progress was being made, but
there's been a burst of activity in the last few months, in particular:

 - Some significant improvements have been made to how the data service
   processes revisions, and I think I now understand what is required to
   improve things further.

 - data.qa.guix.gnu.org has much better hardware and I'm no longer
   renting a machine for it to run on, plus data.guix.gnu.org is being
   hosted by the Guix Foundation rather than me personally.

 - Guile Knots now exists, before I was coping useful code for working
   with fibers between projects, but now I've extracted most of that in
   to a library. It's not stable or documented yet but I'm hoping to get
   to that soon.

Putting aside the more minor issues, I think the main problem with QA
last year was that it was working OK sometimes for patches, but not
working for branches, and that meant that when a branch was merged it
would cause the substitute availability to drop for an extended period
and QA would stop being able to test patches. I think that's still the
main problem.

I'm not sure there's a big change required to get QA working for
branches, my thinking so far is that it's a lot of little fixes and
improvements. Some of the known issues are:

 - Segfaults! Both the build coordinator and qa-frontpage are
   segfaulting, the rate varies but I saw the qa-frontpage segfaulting
   up to 20 times in one day. This mostly seems associated with
   guile-gnutls/gnutls, here's one issue for a backtrace from the
   build-coordinator [3], interestingly the qa-frontpage fails in
   different ways.

3: https://gitlab.com/gnutls/guile/-/issues/29

 - Performance, the build coordinator needed some more performance work
   recently, and there will probably be more to do. I also didn't
   realise how significant the problem of excessive GC duration was for
   the build coordinator. I've made some tweaks and deployed the fix for
   [4] now, and that seems to have helped, but there's more to do.

4: https://github.com/wingo/fibers/issues/109

On top of that, I've put a rough roadmap on [5] (here in Git [6]). This
includes things like the build coordinator segfault issue, but also more
general issues like the monitoring and observability shortcomings.

5: https://bordeaux.guix.gnu.org/
6: 
https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/html/bordeaux/home.html#n59

Personally I think I'm going to try and prioritise some of the items on
this roadmap [5], especially "Automatic nar removal policy" since that's
the last element in the design that isn't implemented and it's been
missing for far too long.

Do let me know if you have any comments or questions.

Thanks,

Chris

Attachment: signature.asc
Description: PGP signature

  • Feburary update o... Christopher Baines

Reply via email to