Hello Ludo,

Apr 13, 2025, 21:09 by l...@gnu.org:


>> + herd -s t-socket-8783 start log-directory-not-writable
>> ++ cat t-log-8783
>> 2025-04-08 07:53:13 GNU Shepherd 1.0.3 (Guile 3.0.9, i586-unknown-gnu)
>> 2025-04-08 07:53:13 Starting service root...
>> 2025-04-08 07:53:13 Service root started.
>> 2025-04-08 07:53:13 Service root running with value #<<process> id: 8788 
>> command: #f>.
>> 2025-04-08 07:53:13 Service root has been started.
>> 2025-04-08 07:53:13 Configuration successfully loaded from 't-conf-8783'.
>> 2025-04-08 07:53:13 Starting service log-directory-does-not-exist...
>> 2025-04-08 07:53:13 Service log-directory-does-not-exist has been started.
>> 2025-04-08 07:53:13 Service log-directory-does-not-exist started.
>> 2025-04-08 07:53:13 Service log-directory-does-not-exist running with value 
>> #<<process> id: 8792 command: 
>> ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" 
>> "-c" "while true; do echo logging; sleep 0.2; done")>.
>> 2025-04-08 07:53:13 Starting service log-directory-not-writable...
>> Terminated
>> ++ rm -f t-log-8783 t-socket-8783 t-conf-8783 t-pid-8783
>> ++ rm -rf 
>> /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-log-directory-8783
>> ++ kill 8788
>> FAIL tests/logging-failure.sh (exit status: 124)
>>
>
> Apparently the ‘shepherd’ process was terminated by the ‘timeout’
> command, meaning that the test hanged for more than 3 minutes.
>
> I failed to reproduce it on Linux (without ‘signalfd’ and without
> ‘PR_SET_CHILD_SUBREAPER’), and I have yet to build dependencies in my
> childhurd to try it natively…
>
If there is something you want me to try let me know.

I was able to reproduce this by extracting the shepherd conf from the test and 
and attempting to run the test manually on the childhurd with the cross 
compiled shepherd (so you should not need to have a native toolchain available)


Might be the attempt to write to /proc.

Starting the log-directory-not-writable service causes the entire shepherd to 
become completely unresponsive causing the timeout failure.

A similar behavior occurs with

(call-with-output-file "/proc/1/something"
  (lambda (port)
    (display "Hello" port)))

which hangs my guile completely.

When I change the log file to a more normal unwritable path the test passes.


>> 2025-04-08 07:56:27 PID #<<process> id: 10610 command: 
>> ("/gnu/store/45ssyl3mps6yvh0k0zsgxs8wjqk9jd6r-bash-minimal-5.1.16/bin/bash" 
>> "-c" "ulimit -n 
>> >/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-nofiles-10595;
>>  sleep 600 & echo $! > 
>> /tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-service-pid-10595")> 
>> (test) is dead!
>>
>
> [...]
>
>> FAIL tests/forking-service.sh (exit status: 1)
>>
>
> This one is a bug fixed in Shepherd commit
> 82a333f8e83b4abbd19dbc6c9bab1b7d62cced0c.
>
Can confirm that this is fixed in 1.0.4rc1.


>> + herd -s t-socket-12440 start logger
>> Starting service system-log...
>> Service system-log has been started.
>> Service system-log started.
>>
>> ;;; (message-destination-> 
>> ("/tmp/guix-build-shepherd-1.0.3.drv-0/shepherd-1.0.3/t-syslog-12440"))
>> Service system-log running with value #<system-log 39f82c0>.
>> Starting service logger...
>> Service logger has been started.
>> + herd -s t-socket-12440 status logger
>> + grep stopped
>> ++ zcat 't-log-12440.*'
>> gzip: t-log-12440.*.gz: No such file or directory
>> Terminated
>>
>
> [...]
>
>> FAIL tests/services/system-log.sh (exit status: 124)
>>
>
> This one also hanged more than 3m it seems.
>
The shepherd syslog seems to be extremely slow:

I extracted the logger.scm and conf.scm from the test, removed the $ from the 
shell variables variables, started shepherd, echoed the test msg into  the kmsg 
file and then tried to start the shepherd syslog.

It took multiple minutes to see the "Service system-log running with value" in 
the log and then another couple of minutes for "herd start syslogd" to return. 
Afterwards trying to query the syslog status (or trying to interact with 
shepherd in any other way) takes forever to complete while syslogd is running.

I did this with the 1.0.4rc1 shepherd to work around the blocking socket 
problem with guix-daemon (#77610, it still fails on the first connection now 
but idk if this is a problem with shepherd or the guix-daemon)

With the 1.0.3 shepherd syslogd seems to be a lot quicker to start initally ( 
see the log from the first message) but still extremely slow to interact with 
afterwards.

Even after removing the test timeout the test will fail eventually after ~900 
seconds because $syslog_auth_file did not show up (yet?).


> To be continued…
>
> Ludo’.
>

Thank you for your hard work on shepherd.



Reply via email to