On Thu, Sep 5, 2013 at 8:04 PM, Evan Dandrea <e...@ubuntu.com> wrote: > On 5 September 2013 18:35, Steve Langasek <steve.langa...@ubuntu.com> wrote: >>> Is this a proposal for 13.10? >> >> I think it's unrealistic to think anything discussed here would land for >> 13.10. We already have plenty of other things on our plate that are on the >> critical path for 13.10. :) > > Agreed :) > >> Anyway, point taken that we shouldn't deploy something that could cause >> processes that were previously perfectly reliable to suddenly be killed by >> some other process which arbitrarily decides they're "misbehaving", thus >> sending the whole system into turmoil. If we're going to go around killing >> system processes, we should be sure that the cure isn't worse than the >> disease. >
It' still an interesting idea, much like the chaos monkeys commonly used in cloud infrastructures, essentially a predator that either randomly or according to some criteria terminates processes and causes havoc. > We should also be careful to not spin out of control ourselves, trying > to play whack-a-mole with an out of control process. Presumably this > is handled by upstart's respawn stanza? > >> Certainly; any salient examples are going to be bugs we already know about, >> and thus which are likely to be fixed or in progress. >> >> The question I have is: would a monitor/killer for runaway processes have >> improved our response to these bugs? Would it have resulted in earlier >> detection? Easier diagnosis? Faster fixing? Would such monitoring tell us >> about other such bugs that we are currently unaware of and need to be? > > I could not agree more with the data-driven approach here. I think > you're absolutely spot on to suggest this needs to prove its value > with some concrete numbers. > >> I'm not convinced that the answer is "yes" to any of these. Obviously, the >> only way to know if it would tell us about bugs we're unaware of is to try >> it and see :), but I think the fact that we are currently unaware of them is >> already a strong indicator that they should not be a high priority, because >> if they were high-impact they would organically rise to our attention. > > So knowing what problems are out there is half of what something like > this gives us. https://errors.ubuntu.com has discovered lots of > serious problems not caught by our pre-release QA. > > The other half is knowing how critical each problem is. Some subset of > the problems out there may rise to our attention, but we wont know how > important they are because we wont have a clear picture of how many > systems they affect. Engineering resource is finite. We have to make > tough decisions on which issues to fix are going to get the most bang > for the buck. > +1. Cheers, Thomas > -- > Mailing list: https://launchpad.net/~ubuntu-phone > Post to : ubuntu-phone@lists.launchpad.net > Unsubscribe : https://launchpad.net/~ubuntu-phone > More help : https://help.launchpad.net/ListHelp -- Mailing list: https://launchpad.net/~ubuntu-phone Post to : ubuntu-phone@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-phone More help : https://help.launchpad.net/ListHelp