On 2024-01-10 00:34:48 +0100, Ludovic Courtès wrote:
> Tomas Volf <~@wolfsden.cz> skribis:
> 
> > On 2024-01-07 15:08:59 +0100, Ludovic Courtès wrote:
> 
> [...]
> 
> >>   ** Do not accidentally wait for Linux kernel thread completion
> >>      (<https://issues.guix.gnu.org/67132>)
> >>
> >>   In cases a PID file contained a bogus PID or one that’s only valid in a
> >>   separate PID namespace, shepherd could end up waiting for the 
> >> termination of
> >>   what’s actually a Linux kernel thread, such as PID 2 (“kthreadd”).  This
> >>   situation is now recognized and avoided.
> >
> > This is great, I will not have to remember to run `modprobe -r mt7921e' 
> > before
> > each shutdown anymore.  I hope.  Looking forward to getting it in the Guix 
> > :)
> 
> D’oh, why did you have to do that?

Otherwise the shepherd would be stuck on shutdown waiting for process named

    [mt76-tx phy0]

to terminate with messages along the lines of:

    shepherd[1]: waiting for process termination (processes left: (1 678))

It is a kernel thread as far as I can tell (based on
https://stackoverflow.com/a/12231039):

    $ cd /proc/678
    $ cat cmdline
    $ readlink exe; echo $?
    1

Removing the module mt7921e stops the thread, so shepherd does not wait for it.

> How did Shepherd end up with “wrong” PID?

That I do not know.  It is visible in `ps' output, so I assume shepherd picked
it up on its own somehow.

> 
> I hope this release fixes it!

As far as I can tell, the 0.10.3 was already added into guix:

    $ ps 1 | cat
      PID TTY      STAT   TIME COMMAND
        1 ?        Sl     0:01 
/gnu/store/bhynhk0c6ssq3fqqc59fvhxjzwywsjbb-guile-3.0.9/bin/guile 
--no-auto-compile 
/gnu/store/06mz0yjkghi7r6d7lmhvv7gryipljhdd-shepherd-0.10.3/bin/shepherd 
--config /gnu/store/klkqq2y65k141rlipq4ls0w2rlhds12h-shepherd.conf

So I have to say it sadly did not resolve this issue.  I am unsure why though.
I am not familiar with Shepherd's code base, but quick look at the git log
suggested that procedure (@@ (shepherd service) pseudo-process?) is the relevant
one.  When I try it from a REPL, it returns #t.

    $ guix shell guile shepherd guile-fibers -- guile
    GNU Guile 3.0.9
    Copyright (C) 1995-2023 Free Software Foundation, Inc.
    
    Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
    This program is free software, and you are welcome to redistribute it
    under certain conditions; type `,show c' for details.
    
    Enter `,help' for help.
    scheme@(guile-user)> ,use (shepherd service)
    scheme@(guile-user)> ((@@ (shepherd service) pseudo-process?) 688)
    $1 = #t

So it *should* work?  However the issue is caused by non-free WiFi driver on a
corrupted kernel, so I am not sure if it is even problem that needs to be
solved...  I would (obviously) like to see it resolved, but I probably cannot
even bug report it, since it requires non-free hardware and software to
reproduce.

Tomas

PS: It is interesting that `guix shell guile shepherd' is not enough, the
    guile-fibers have to be explicitly specified as well.  Is that expected?

-- 
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.

Attachment: signature.asc
Description: PGP signature

Reply via email to