init-d-script: stop action: missing --oknodo in s-s-d invocation

Jan Braun Thu, 12 Dec 2019 00:28:05 -0800

Hey,

Guillem Jover schrob:
> Actually, and this is something I've had pending raising for a long
> time, I don't understand that two s-s-d invocations pattern in stop
> which seems a bit pointless TBH? The --retry should make s-s-d wait
> for the child to exit, or terminate it forcibly.


I think the idea behind that construct is preventing
the following race condition:

1)  s-s-d scans for matching processes, sends TERM to at least one
    worker child of the daemon, gets interrupted.
2)  The worker child exits obligingly.
3)  The daemon itself, not having gotten TERM yet, notices the SIGCHLD
    and starts a replacement worker process.
4)  s-s-d gets resumed and proceeds with terminating the remaining
    processes it found in step 1, who subsequently die one way or the
    other.
4a) s-s-d does NOT rescan for newly appearing matches, thus missing the
    newly appeared worker from step 3.
5) Finding all matched processes gone, s-s-d exits 0.
6) The worker process from step 3 is still alive, causing issues.


Notes:
* I don't know whether s-s-d rescans in step 4a. But if it does, then I
  really can't see a point to all this.
* You can replace steps 2+3 by "the daemon randomly decides to spawn a
  new child".
* If the daemon, upon TERM, first kills its children, waits for them to
  die, and only then exits itself, then the race can't happen.
  That's what I'd expect as standard behaviour. But obviously, the init
  script also needs to work in situations where the daemon doesn't
  behave properly.


> I also don't understand the diverging --name usage, because DAEMON
> cannot have a different name to the --startas argument anyway, so it
> cannot be used for interpreted scripts.

Yes, that's weird.
Let me think out loud here for a bit:
--exec
    When matching, specifies the path of the executable running. That's
    the interpreter in case of scripts. (i.e. readlink /proc/$PID/exe)
--startas
    How to --start the daemon. Defaults to --exec.
--name
    Matches the /proc/$PID/comm entry (at least on Linux), which is the
    basename of the way it was started. It can be arbitrarily changed by
    the process (e.g. firefox-esr does this).¹


If that's correct, then

a) "--start --startas $DAEMON --exec $DAEMON" (line 49) is redundant.

b) "--stop --exec $DAEMON" (*both* stop invocations, line 78 and 89)
   will not match a $DAEMON that's a script, rather than a native
   executable.

c) Hence init-d-script doesn't currently work with daemons that are
   implemented as a script.²

d) Assuming $COMMAND_NAME is the basename of $DAEMON, then
   "--exec $DAEMON --name ${COMMAND_NAME}" is equivalent to
   "--exec $DAEMON" iff $DAEMON is the actual native executable (i.e.
   neither a symlink nor a script) and the process didn't change its
   "comm".

e) The symlink case doesn't seem too interesting. A package should know
   which one's the real executable, and I can't come up with a use case
   for "stop only if started though *this* symlink". But maybe I'm
   missing things here.

f) Presumably, whoever wrote the code had a daemon with children that
   changed their "comm" field, and the first s-s-d invocation is
   supposed to only kill the (parent) daemon. Hence, the first
   invocation matches on $PIDFILE and "comm", while the second
   invocation kills less discriminately.

Thus I *think* the code does what it's supposed to do.
Whether that's behaviour/complexity wanted enabled by default in
/lib/init/init-d-script , I can't judge. Note tho that trying to support
daemons implemented as scripts will necessarily blow away this code.


Two more observations:

* The second s-s-d --stop invocation has "--retry=0/30/KILL/5". What's
  that zero doing there? Shouldn't that be TERM instead, as in the first
  invocation?

* Runit's sv(8) sends TERM and CONT to bring down a service gracefully,
  because a STOPped process obviously can't react to TERM. Maybe that's
  something you want to adopt as well?

HTH and regards,
    Jan

¹ However, "comm" is distinct from argv[0], which *also* can be
arbitrarily changed (e.g. openntpd does that). Argh.
² out-of-the box. Obviously, stuff can be overridden to make presumably
anything work.

signature.asc
Description: PGP signature

Bug#945464: /lib/init/init-d-script: stop action: missing --oknodo in s-s-d invocation

Reply via email to