bug#77634: [shepherd] Test failures on GNU/Hurd

yelninei--- via Bug reports for GNU Guix Sat, 19 Apr 2025 02:05:37 -0700

Hi Ludo,


Apr 14, 2025, 14:53 by yelni...@tutamail.com:

>
>
>>> The shepherd syslog seems to be extremely slow:
>>>
>>> I extracted the logger.scm and conf.scm from the test, removed the $
>>> from the shell variables variables, started shepherd, echoed the test
>>> msg into  the kmsg file and then tried to start the shepherd syslog.
>>>
>>> It took multiple minutes to see the "Service system-log running with
>>> value" in the log and then another couple of minutes for "herd start
>>> syslogd" to return. Afterwards trying to query the syslog status (or
>>> trying to interact with shepherd in any other way) takes forever to
>>> complete while syslogd is running.
>>>
>>> I did this with the 1.0.4rc1 shepherd to work around the blocking
>>> socket problem with guix-daemon (#77610, it still fails on the first
>>> connection now but idk if this is a problem with shepherd or the
>>> guix-daemon)
>>>
>>> With the 1.0.3 shepherd syslogd seems to be a lot quicker to start
>>> initally ( see the log from the first message) but still extremely
>>> slow to interact with afterwards.
>>>
>>
>> Weird.  Could you bisect between 1.0.3 and HEAD to try and find the
>> source of slowness?
>>
>> I wonder if f1a82845eaf8851af9a811e5a1d185b68b1c363f might explain it,
>> but that’s pre-1.0.3.
>>
>> Alternatively, you can try running ‘shepherd’ under ‘rpctrace’ for the
>> syslog slowness issue, so we get an idea of what it’s doing.
>>
>> Thanks a lot for helping out!
>>
>> Ludo’.
>>
> It is still slow on 1.0.3 to do anything once its started but with 1.0.4rc1 
> even starting is extremely painful.
>
> I tried to manually add features to the system-log-service and the thing that 
> causes everything to slow down drastically is the addition of the 
> #:kernel-log-file (even if it is empty).
>
> Setting that to #f instead makes the test now fail considerable faster (1.8s 
> rather than 900).
>
>
>
I think the issue is that shepherd constantly reads the eof object from the 
input port (i guess the port is always readable to fibers but with nothing 
there). Trying to add a print to that case leads to similar unresponsiveness on 
linux as well.
So my theory is that that fibers is struggling to suspend the syslog loop.
When i added a (sleep 1) to "wait-for-input-or-message" the issue went away.

I am actually impressed that this seems to be less of a problem on linux.
I then tried to fix the test by adjusting the first case of the tests message 
destination to 

(and=> (system-log-message-sender message)
                    (lambda (addr) (not (= AF_UNIX (sockaddr:fam addr)))))

After adjusting some of the sleep times in the test it is now passing, but ofc 
this is not really a fix.
Some minor other things:
-  Currently the NULL byte from the sender is added into the log message, might 
be worth filtering out
- "herd status syslogd" does not show the endpoints and kernel-log-file the 
syslog is reading from
- the shepherd package from the shepherd channel in 'set-fibers-directory sets 
fibers ccache dir also to the fibers source dir
- I am unable to cross compile shepehrd from git with guix because of help2man
- the guix shepherd-syslog service currently does not expose setting the 
endpoints and kernel-log-file (currently on my childhurd this would be 
/dev/klog instead of /dev/kmsg)

bug#77634: [shepherd] Test failures on GNU/Hurd

Reply via email to