On Wed, Mar 27, 2013 at 02:30:05PM -0700, Gurucharan Shetty wrote: > Currently, when we stop a daemon, we first send it SIGTERM. > If SIGTERM did not work within ~5 seconds, we send a SIGKILL. > After sending SIGKILL, we wait only for 4 seconds, before giving > up. > > If the system is exteremely busy, there is a chance that a > process is not killed by the kernel within 4 seconds. In such > a case, when we try to start the daemon immediately, we see that > the pid inside the pid-file is valid and assume that the daemon > is still running. This leaves us in a state, where the daemon is > actually not running. > > This patch increases the time waiting for the kernel to kill the > process to 60 seconds. > > Bug #15404. > Signed-off-by: Gurucharan Shetty <gshe...@nicira.com>
I see why you changed the FAIL case, but I think that it might instead be better to do something like the following, to avoid duplicating the pid_exists call in two places: diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in index d010abf..b44ab37 100644 --- a/utilities/ovs-lib.in +++ b/utilities/ovs-lib.in @@ -173,6 +173,9 @@ stop_daemon () { if test -e "$rundir/$1.pid"; then if pid=`cat "$rundir/$1.pid"`; then for action in TERM .1 .25 .65 1 1 1 1 KILL 1 1 1 1 FAIL; do + if pid_exists $pid >/dev/null 2>&1; then :; else + return 0 + fi case $action in TERM) action "Killing $1 ($pid)" kill $pid @@ -185,11 +188,7 @@ stop_daemon () { return 1 ;; *) - if pid_exists $pid >/dev/null 2>&1; then - sleep $action - else - return 0 - fi + sleep $action ;; esac done _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev