Hello, The previous patch was not working. The reason is that, when a process is forked and before execv is called, it shares the parent signal handler.
So when sending a SIGTERM to the child process, we are stopping root-service, if the signal is received before the child has forked. The work-around here is to save the installed SIGTERM handler and reset it. Then, after forking, the parent can restore the SIGTERM handler. The child will use the default SIGTERM handler that terminates the process. WDYT? Thanks, Mathieu
>From aa6f67068f1fdd622673ec0223f05fd8f8a96baa Mon Sep 17 00:00:00 2001 From: Mathieu Othacehe <m.othac...@gmail.com> Date: Thu, 7 May 2020 18:39:41 +0200 Subject: [PATCH] service: Fix 'make-kill-destructor' when PGID is zero. When a process is forked, and before its GID is changed in "exec-command", it will share the parent GID, which is 0 for Shepherd. In that case, use the PID instead of the PGID. Also make sure to reset the SIGTERM handler before forking a process. Failing to do so, will result in stopping Shepherd if a SIGTERM is received between fork and execv calls. Restore the SIGTERM handler once the process has been forked. * modules/shepherd/service.scm (fork+exec-command): Save the current SIGTERM handler and reset it before forking. Then, restore it on the parent after forking. (make-kill-destructor): Handle the case when PGID is zero, between the process fork and exec. --- modules/shepherd/service.scm | 36 +++++++++++++++++++++++++++++------- tests/forking-service.sh | 18 ++++++++++++++++++ 2 files changed, 47 insertions(+), 7 deletions(-) diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm index 8604d2f..c68d7e0 100644 --- a/modules/shepherd/service.scm +++ b/modules/shepherd/service.scm @@ -5,6 +5,7 @@ ;; Copyright (C) 2016 Alex Kost <alez...@gmail.com> ;; Copyright (C) 2018 Carlo Zancanaro <ca...@zancanaro.id.au> ;; Copyright (C) 2019 Ricardo Wurmus <rek...@elephly.net> +;; Copyright (C) 2020 Mathieu Othacehe <m.othac...@gmail.com> ;; ;; This file is part of the GNU Shepherd. ;; @@ -884,7 +885,18 @@ its PID." (unless %sigchld-handler-installed? (sigaction SIGCHLD handle-SIGCHLD SA_NOCLDSTOP) (set! %sigchld-handler-installed? #t)) - (let ((pid (primitive-fork))) + + ;; When forking a process, the signal handlers are inherited, until it + ;; forks. If SIGTERM is received by the forked process, before it calls + ;; execv, the installed SIGTERM handler, stopping Shepherd will be called. + ;; To avoid this, save the SIGTERM handler, disable it, and restore it once, + ;; the process has been forked. This way, the forked process will use the + ;; default SIGTERM handler stopping the process. + (let ((term-handler (match (sigaction SIGTERM) + ((proc . _) + proc))) + (pid (and (sigaction SIGTERM SIG_DFL) + (primitive-fork)))) (if (zero? pid) (exec-command command #:user user @@ -893,7 +905,10 @@ its PID." #:directory directory #:file-creation-mask file-creation-mask #:environment-variables environment-variables) - pid))) + (begin + ;; Restore the initial SIGTERM handler. + (sigaction SIGTERM term-handler) + pid)))) (define* (make-forkexec-constructor command #:key @@ -957,11 +972,18 @@ start." "Return a procedure that sends SIGNAL to the process group of the PID given as argument, where SIGNAL defaults to `SIGTERM'." (lambda (pid . args) - ;; Kill the whole process group PID belongs to. Don't assume that PID - ;; is a process group ID: that's not the case when using #:pid-file, - ;; where the process group ID is the PID of the process that - ;; "daemonized". - (kill (- (getpgid pid)) signal) + ;; Kill the whole process group PID belongs to. Don't assume that PID is + ;; a process group ID: that's not the case when using #:pid-file, where + ;; the process group ID is the PID of the process that "daemonized". If + ;; this procedure is called, between the process fork and exec, the PGID + ;; will still be zero (the Shepherd PGID). In that case, use the PID. + (let ((current-pgid (getpgid 0)) + (pgid (getpgid pid))) + (if (eq? pgid current-pgid) + (begin + (kill pid signal)) + (begin + (kill (- pgid) signal)))) #f)) ;; Produce a constructor that executes a command. diff --git a/tests/forking-service.sh b/tests/forking-service.sh index 7126c3d..47b09a2 100644 --- a/tests/forking-service.sh +++ b/tests/forking-service.sh @@ -71,6 +71,17 @@ cat > "$conf"<<EOF #:pid-file "$PWD/$service2_pid") #:stop (make-kill-destructor) #:respawn? #t)) + +(define %command3 + '("$SHELL" "-c" "sleep 600")) + +(register-services + (make <service> + ;; A service that forks into a different process. + #:provides '(test3) + #:start (make-forkexec-constructor %command3) + #:stop (make-kill-destructor) + #:respawn? #t)) EOF cat $conf @@ -113,3 +124,10 @@ sleep 1; $herd status test2 | grep started test "`cat $PWD/$service2_started`" = "started started" + +# Try to trigger eventual race conditions, when killing a process between fork +# and execv calls. +for i in {1..50} +do + $herd restart test3 +done -- 2.26.0