On Thu, 2010-12-16 at 15:45 +0000, Michael Biebl wrote: > 2010/12/16 Clint Byrum <cl...@fewbar.com>: > > > > > > I'm attaching a debdiff that solves the race as far as I can tell, > > though I think it needs a good long look, since it could mean shutdowns > > hang for a long time waiting (I'm especially curious if the pre-stop > > /post-stop's are subject to kill timeout) > > This code is still racy, afaics. What about upstart jobs, which are > not stopped by "stop on runlevel [016]"? They could receive their stop > signal at a point when your loop has already been run. >
Indeed, there is still a race I think now that I dig through upstart's code a bit. If any of the jobs in the stop/!waiting state have 'stop on stopped' jobs that will be stopped after they stop, the event isn't emitted until *after* the transition to stop/waiting. thread A (upstart job foo): start/running -> stop/pre-stop sends TERM to owned process stop/pre-stop -> stop/killed process dies stop/killed -> stop/waiting emit stopped JOB=foo thread B (upstart job baz) start/running -> stop/pre-stop sends kill to owned process stop/pre-stop -> stop/killed process dies stop/killed -> stop/waiting thread C (sleep loop) runs initctl list greps sleeps runs initctl list greps sleeps list is handled by doing a "get all jobs" command first, and then individual status commands for each job, so its entirely possible that we will ask for the status of baz and it will say start/running, and then foo finishes its transition, then we ask for foo's status and it is stop/waiting, and we think we're done. This race would probably be solved by having a "list all jobs with status" command, as long as the stopped event is guaranteed to be consumed before any commands, which, I believe it will. One delicate issue is that if an upstart managed process dies for any other reason than being stopped, upstart will try to respawn it, so we can't just go sending SIGTERM/SIGKILL to all pids, as upstart will fight us on those. We actually have to stop everything. > If you don't want to change existing jobs, we probably have to pick up > Ante's suggestion, and do the following in sendsigs: > > 1) run a for loop to wait for *all* running upstart jobs to stop. > upstart jobs which need to keep running past sendsigs (e.g. plymouth) > need to signal that using a similar mechanism like the killall5 > sendsigs.d omit interface. I'd at least give upstart jobs 60secs time > to stop, so big databases etc have enough time to cleanly shutdown IMO, leaving out a valid stop on that gets it stopped at or before runlevel [016] is the equivilent of the omit interface. You've started it, saying exactly when upstart should or should not stop it. However, if you've wandered into the scenario mentioned above with stop on stopped foo, then we need to handle that. > 2.) run a for loop and send SIGTERM all remaining processes, but do > *not* add upstart pids to $OMITPIDS See above, you'd have to send 'stop' commands to upstart for them, instead of omitting them. > 3.) send a final SIGKILL if any processes are left. > I'd say "let upstart do that".. but how do we know when we can continue on to unmounting? I suppose after a lengthy timeout (60s does seem long enough, though mysql can take longer) this makes sense. > > Regarding 1.), it would be nice to have a native C implementation in > upstart, instead of running initctl, grep and sleep manually. > I agree, but I'm having trouble envisioning exactly what one would ask for. "Block until all current goals are reached." Would work maybe. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to mysql-5.1 in ubuntu. https://bugs.launchpad.net/bugs/688541 Title: race condition on shutdown (leads to corrupted fs) -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs