Ludovic Courtès <l...@gnu.org> writes:

> Hey Tomas,
>
> Ludovic Courtès <l...@gnu.org> skribis:
>
>> I tried the config file you gave with:
>>
>>   ./pre-inst-env guix system vm /tmp/config.scm
>>
>> and it hangs, to my surprise (I’ve been using ‘system-log’ on my laptop
>> since June, and “make check-system TESTS=basic” & co. pass).
>
> After spending hours on this and fixing improbable issues in the
> Shepherd (will push shortly), I found that the root of the problem is
> exactly what I feared and which led to the patches at
> <https://issues.guix.gnu.org/76262>.
>
> Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes
> with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky,
> it loses the race and waits forever.

Observation here.  While yes, based on the description I agree that it
is (bad) luck based, in practice it seems to be extremely reliable to
reproduce.

At first I struggled to reproduce again, it did not hang even single
time (out of 5 tries) on the bad commit, but once I reverted my
configuration to what it was back then (== removed few shepherd timers),
the hang started happening every single time.

So, while in theory it should be a probabilistic problem, in practice it
does not seem to be the case.  Not sure where I am going with this, I
just think it is interesting.

>
> Could you try your config with the patch at
> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on
> the metal?

I have reverted your revert and applied the patch 2 on top of that.

Steps I took (both in VM and on a spare laptop):

1. Reconfigure from commit 1.
2. Ensure it still hangs (5x).
3. Reconfigure from commit 2.
4. Ensure it no longer hangs (5x).

I can confirm the patch 2 fixes the issue for me, both in the VM and on
physical machine.

Only thing I have noticed that even when deploying the "good" commit, I
see the following error in the log:

--8<---------------cut here---------------start------------->8---
guix deploy: warning: an error occurred while upgrading services on '127.0.0.1':
%exception #<inferior-object #<&service-not-found-error service: system-log>>
--8<---------------cut here---------------end--------------->8---

The system comes up fine after reboot though.

>
> Thanks in advance,
> Ludo’.

Thank you for figuring this one out. :)

Tomas

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

Attachment: signature.asc
Description: PGP signature

Reply via email to