Ludovic Courtès <l...@gnu.org> writes: > Hey Tomas, > > Ludovic Courtès <l...@gnu.org> skribis: > >> I tried the config file you gave with: >> >> ./pre-inst-env guix system vm /tmp/config.scm >> >> and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop >> since June, and “make check-system TESTS=basic” & co. pass). > > After spending hours on this and fixing improbable issues in the > Shepherd (will push shortly), I found that the root of the problem is > exactly what I feared and which led to the patches at > <https://issues.guix.gnu.org/76262>. > > Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes > with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky, > it loses the race and waits forever.
Observation here. While yes, based on the description I agree that it is (bad) luck based, in practice it seems to be extremely reliable to reproduce. At first I struggled to reproduce again, it did not hang even single time (out of 5 tries) on the bad commit, but once I reverted my configuration to what it was back then (== removed few shepherd timers), the hang started happening every single time. So, while in theory it should be a probabilistic problem, in practice it does not seem to be the case. Not sure where I am going with this, I just think it is interesting. > > Could you try your config with the patch at > <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on > the metal? I have reverted your revert and applied the patch 2 on top of that. Steps I took (both in VM and on a spare laptop): 1. Reconfigure from commit 1. 2. Ensure it still hangs (5x). 3. Reconfigure from commit 2. 4. Ensure it no longer hangs (5x). I can confirm the patch 2 fixes the issue for me, both in the VM and on physical machine. Only thing I have noticed that even when deploying the "good" commit, I see the following error in the log: --8<---------------cut here---------------start------------->8--- guix deploy: warning: an error occurred while upgrading services on '127.0.0.1': %exception #<inferior-object #<&service-not-found-error service: system-log>> --8<---------------cut here---------------end--------------->8--- The system comes up fine after reboot though. > > Thanks in advance, > Ludo’. Thank you for figuring this one out. :) Tomas -- There are only two hard things in Computer Science: cache invalidation, naming things and off-by-one errors.
signature.asc
Description: PGP signature