Hey Ludo, > Woow, good catch!
Thanks :) >> The work-around here is to save the installed SIGTERM handler and reset >> it. Then, after forking, the parent can restore the SIGTERM handler. The >> child will use the default SIGTERM handler that terminates the process. > > OK, makes sense. (Another occasion to see how something like > ‘posix_spawn’ would be more appropriate than fork + exec…) Didn't know about that function, but it seems way easier to manipulate and less error prone indeed! > Shouldn’t it be: > > (let ((pgid (getpgid pid))) > (if (= (getpgid 0) pgid) > (kill pid signal) ;don't kill ourself > (kill (-p pgid) signal))) Yes, I changed it. > Note: Use the most specific comparison procedure, ‘=’ in this case, > because we know we’re dealing with numbers (it enables proper type > checking, better compiler optimizations, etc.). More importantly, ‘eq?’ > performs pointer comparison, so it shouldn’t be used with numbers (in > practice it works with “fixnums” but not with bignums). Ok, noted! >> +# Try to trigger eventual race conditions, when killing a process between >> fork >> +# and execv calls. >> +for i in {1..50} >> +do >> + $herd restart test3 >> +done > > Would it reproduce the problem well enough? On a slow machine one time out of two, and on a faster machine, never. The 'reproducer' I used, was to add a 'sleep' between fork and exec, it works way better! Tell me if you think its better to drop it. > Could you send an updated patch? Here it is! > Thanks for the bug hunting and for the patch! Thanks for the fast review :) Mathieu
>From 79d3603bf15b8f815136178be8c8a236734a7596 Mon Sep 17 00:00:00 2001 From: Mathieu Othacehe <m.othac...@gmail.com> Date: Thu, 7 May 2020 18:39:41 +0200 Subject: [PATCH] service: Fix 'make-kill-destructor' when PGID is zero. When a process is forked, and before its GID is changed in "exec-command", it will share the parent GID, which is 0 for Shepherd. In that case, use the PID instead of the PGID. Also make sure to reset the SIGTERM handler before forking a process. Failing to do so, will result in stopping Shepherd if a SIGTERM is received between fork and execv calls. Restore the SIGTERM handler once the process has been forked. * modules/shepherd/service.scm (fork+exec-command): Save the current SIGTERM handler and reset it before forking. Then, restore it on the parent after forking. (make-kill-destructor): Handle the case when PGID is zero, between the process fork and exec. --- modules/shepherd/service.scm | 33 ++++++++++++++++++++++++++------- tests/forking-service.sh | 18 ++++++++++++++++++ 2 files changed, 44 insertions(+), 7 deletions(-) diff --git a/modules/shepherd/service.scm b/modules/shepherd/service.scm index 8604d2f..45fcf32 100644 --- a/modules/shepherd/service.scm +++ b/modules/shepherd/service.scm @@ -5,6 +5,7 @@ ;; Copyright (C) 2016 Alex Kost <alez...@gmail.com> ;; Copyright (C) 2018 Carlo Zancanaro <ca...@zancanaro.id.au> ;; Copyright (C) 2019 Ricardo Wurmus <rek...@elephly.net> +;; Copyright (C) 2020 Mathieu Othacehe <m.othac...@gmail.com> ;; ;; This file is part of the GNU Shepherd. ;; @@ -884,7 +885,18 @@ its PID." (unless %sigchld-handler-installed? (sigaction SIGCHLD handle-SIGCHLD SA_NOCLDSTOP) (set! %sigchld-handler-installed? #t)) - (let ((pid (primitive-fork))) + + ;; When forking a process, the signal handlers are inherited, until it + ;; forks. If SIGTERM is received by the forked process, before it calls + ;; execv, the installed SIGTERM handler, stopping Shepherd will be called. + ;; To avoid this, save the SIGTERM handler, disable it, and restore it once, + ;; the process has been forked. This way, the forked process will use the + ;; default SIGTERM handler stopping the process. + (let ((term-handler (match (sigaction SIGTERM) + ((proc . _) + proc))) + (pid (and (sigaction SIGTERM SIG_DFL) + (primitive-fork)))) (if (zero? pid) (exec-command command #:user user @@ -893,7 +905,10 @@ its PID." #:directory directory #:file-creation-mask file-creation-mask #:environment-variables environment-variables) - pid))) + (begin + ;; Restore the initial SIGTERM handler. + (sigaction SIGTERM term-handler) + pid)))) (define* (make-forkexec-constructor command #:key @@ -957,11 +972,15 @@ start." "Return a procedure that sends SIGNAL to the process group of the PID given as argument, where SIGNAL defaults to `SIGTERM'." (lambda (pid . args) - ;; Kill the whole process group PID belongs to. Don't assume that PID - ;; is a process group ID: that's not the case when using #:pid-file, - ;; where the process group ID is the PID of the process that - ;; "daemonized". - (kill (- (getpgid pid)) signal) + ;; Kill the whole process group PID belongs to. Don't assume that PID is + ;; a process group ID: that's not the case when using #:pid-file, where + ;; the process group ID is the PID of the process that "daemonized". If + ;; this procedure is called, between the process fork and exec, the PGID + ;; will still be zero (the Shepherd PGID). In that case, use the PID. + (let ((pgid (getpgid pid))) + (if (= (getpgid 0) pgid) + (kill pid signal) ;don't kill ourself + (kill (- pgid) signal))) #f)) ;; Produce a constructor that executes a command. diff --git a/tests/forking-service.sh b/tests/forking-service.sh index 7126c3d..bd9aac9 100644 --- a/tests/forking-service.sh +++ b/tests/forking-service.sh @@ -71,6 +71,17 @@ cat > "$conf"<<EOF #:pid-file "$PWD/$service2_pid") #:stop (make-kill-destructor) #:respawn? #t)) + +(define %command3 + '("$SHELL" "-c" "sleep 600")) + +(register-services + (make <service> + ;; A service that forks into a different process. + #:provides '(test3) + #:start (make-forkexec-constructor %command3) + #:stop (make-kill-destructor) + #:respawn? #t)) EOF cat $conf @@ -113,3 +124,10 @@ sleep 1; $herd status test2 | grep started test "`cat $PWD/$service2_started`" = "started started" + +# Try to trigger eventual race conditions, when killing a process between fork +# and execv calls. +for i in `seq 1 50` +do + $herd restart test3 +done -- 2.26.2