On Wed, Mar 27, 2013 at 3:38 PM, Ben Pfaff <b...@nicira.com> wrote: > On Wed, Mar 27, 2013 at 02:30:05PM -0700, Gurucharan Shetty wrote: >> Currently, when we stop a daemon, we first send it SIGTERM. >> If SIGTERM did not work within ~5 seconds, we send a SIGKILL. >> After sending SIGKILL, we wait only for 4 seconds, before giving >> up. >> >> If the system is exteremely busy, there is a chance that a >> process is not killed by the kernel within 4 seconds. In such >> a case, when we try to start the daemon immediately, we see that >> the pid inside the pid-file is valid and assume that the daemon >> is still running. This leaves us in a state, where the daemon is >> actually not running. >> >> This patch increases the time waiting for the kernel to kill the >> process to 60 seconds. >> >> Bug #15404. >> Signed-off-by: Gurucharan Shetty <gshe...@nicira.com> > > I see why you changed the FAIL case, but I think that it might instead > be better to do something like the following, to avoid duplicating the > pid_exists call in two places: > > diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in > index d010abf..b44ab37 100644 > --- a/utilities/ovs-lib.in > +++ b/utilities/ovs-lib.in > @@ -173,6 +173,9 @@ stop_daemon () { > if test -e "$rundir/$1.pid"; then > if pid=`cat "$rundir/$1.pid"`; then > for action in TERM .1 .25 .65 1 1 1 1 KILL 1 1 1 1 FAIL; do > + if pid_exists $pid >/dev/null 2>&1; then :; else > + return 0 > + fi Yes, this is better. I will make the change and commit this to master and 1.10
> case $action in > TERM) > action "Killing $1 ($pid)" kill $pid > @@ -185,11 +188,7 @@ stop_daemon () { > return 1 > ;; > *) > - if pid_exists $pid >/dev/null 2>&1; then > - sleep $action > - else > - return 0 > - fi > + sleep $action > ;; > esac > done _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev