> I have a lot of custom Shepherd services. Every so often I make a > mistake that stalls the step in 'guix deploy' that upgrades Shepherd > services, but without any error messages. > > Unfortunately, I can also no longer run 'herd status', which likewise > hangs, or 'reboot'. How may I debug such issues in my operating-system > declaration, please?
Ludo, this is the kind of issue for which extensive logging is needed. i.e. there's no self-contained reproducer (or is there, Felix?), and it requires a live environment to experience it. and i suspect that i may even have fixed this in one of the commits that cleans up shepherd's error handling. one of the issues i remember is that an exception from the start (or stop?) GEXP of a service sometimes brought shepherd into a non-responsive state (without any sign of it in its logs). Felix, i'm planning to rebase my branch on Ludo's devel branch. it's not trivial because Ludo continues hacking shepherd, but i'll hopefully do it in the next few days. after that you may give it a try and see if you experience this issue again, and if you do then you can have plenty of logs to give you a clue why/how it happens. if you do have a reproducer, then i'd be interested in adding it as a test in the shepherd codebase. https://codeberg.org/attila-lendvai-patches/shepherd/commits/branch/various -- • attila lendvai • PGP: 963F 5D5F 45C7 DFCD 0A39 -- “It is humiliating to realize that when you drive yourself underground, when you fake who you are, often you do so for people you do not even like or respect.” — Nathaniel Branden (1930–2014)