2010/12/16 Clint Byrum <cl...@fewbar.com>: > > /etc/init.d/sendsigs has this code: > > > # Upstart jobs have their own "stop on" clauses that sends > # SIGTERM/SIGKILL just like this, so if they're still running, > # they're supposed to be > for pid in $(initctl list | sed -n -e "/process [0-9]/s/.*process > //p"); do > OMITPIDS="${OMITPIDS:+$OMITPIDS }-o $pid" > done > > > It uses this to determine which pids not to kill because, presumably, upstart > should be managing them. > > However, this code is flawed. killall5 will kill the children of all of > these if they are multi process daemons or scripts running things.
This observation is correct. On the other hand, isn't this exactly what the sendsigs script is for: clean up any remaining, stray processes which have not been stopped by its corresponding sysv init script or upstart job (or have been e.g. started by the user)? But I guess you are right, we should first stop all upstart jobs, give them time to finish stopping, and then let sendsigs clean up anything remaining afterwards. > However, this technique can actually be used to determine if there are > still jobs that are supposed to be stopped, but haven't finished > stopping yet. Since they should be listed as stop/(pre-stop|post- > stop|killed), we can determine exactly which pids we expect to go away. > Since upstart has its own idea of how long to wait before it kills > these, we should actually wait indefinitely. > > I'm attaching a debdiff that solves the race as far as I can tell, > though I think it needs a good long look, since it could mean shutdowns > hang for a long time waiting (I'm especially curious if the pre-stop > /post-stop's are subject to kill timeout) This code is still racy, afaics. What about upstart jobs, which are not stopped by "stop on runlevel [016]"? They could receive their stop signal at a point when your loop has already been run. If you don't want to change existing jobs, we probably have to pick up Ante's suggestion, and do the following in sendsigs: 1) run a for loop to wait for *all* running upstart jobs to stop. upstart jobs which need to keep running past sendsigs (e.g. plymouth) need to signal that using a similar mechanism like the killall5 sendsigs.d omit interface. I'd at least give upstart jobs 60secs time to stop, so big databases etc have enough time to cleanly shutdown 2.) run a for loop and send SIGTERM all remaining processes, but do *not* add upstart pids to $OMITPIDS 3.) send a final SIGKILL if any processes are left. Regarding 1.), it would be nice to have a native C implementation in upstart, instead of running initctl, grep and sleep manually. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to mysql-5.1 in ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs