Hey, It's been a long while since I sent out an update email, regarding automated testing for patches. Mid last year I did send out an email about starting a Quality Assurance Meeting/Team/Sociocracy circle [1] and late last year I sent out an email about data.qa.guix.gnu.org moving server.
1: https://lists.gnu.org/archive/html/guix-devel/2024-06/msg00217.html 2: https://lists.gnu.org/archive/html/guix-devel/2024-12/msg00249.html My plan for last year was to spend much more time working on the Guile guix-daemon, that didn't happen though and instead I ended up trying to keep QA and the surrounding services running and maybe even sometimes working. Last year at least it didn't feel like much progress was being made, but there's been a burst of activity in the last few months, in particular: - Some significant improvements have been made to how the data service processes revisions, and I think I now understand what is required to improve things further. - data.qa.guix.gnu.org has much better hardware and I'm no longer renting a machine for it to run on, plus data.guix.gnu.org is being hosted by the Guix Foundation rather than me personally. - Guile Knots now exists, before I was coping useful code for working with fibers between projects, but now I've extracted most of that in to a library. It's not stable or documented yet but I'm hoping to get to that soon. Putting aside the more minor issues, I think the main problem with QA last year was that it was working OK sometimes for patches, but not working for branches, and that meant that when a branch was merged it would cause the substitute availability to drop for an extended period and QA would stop being able to test patches. I think that's still the main problem. I'm not sure there's a big change required to get QA working for branches, my thinking so far is that it's a lot of little fixes and improvements. Some of the known issues are: - Segfaults! Both the build coordinator and qa-frontpage are segfaulting, the rate varies but I saw the qa-frontpage segfaulting up to 20 times in one day. This mostly seems associated with guile-gnutls/gnutls, here's one issue for a backtrace from the build-coordinator [3], interestingly the qa-frontpage fails in different ways. 3: https://gitlab.com/gnutls/guile/-/issues/29 - Performance, the build coordinator needed some more performance work recently, and there will probably be more to do. I also didn't realise how significant the problem of excessive GC duration was for the build coordinator. I've made some tweaks and deployed the fix for [4] now, and that seems to have helped, but there's more to do. 4: https://github.com/wingo/fibers/issues/109 On top of that, I've put a rough roadmap on [5] (here in Git [6]). This includes things like the build coordinator segfault issue, but also more general issues like the monitoring and observability shortcomings. 5: https://bordeaux.guix.gnu.org/ 6: https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/html/bordeaux/home.html#n59 Personally I think I'm going to try and prioritise some of the items on this roadmap [5], especially "Automatic nar removal policy" since that's the last element in the design that isn't implemented and it's been missing for far too long. Do let me know if you have any comments or questions. Thanks, Chris
signature.asc
Description: PGP signature