On Wed, Sep 04, 2013 at 11:52:53AM -0400, Tony Espy wrote: > On 09/04/2013 05:49 AM, Evan Dandrea wrote: > > In another discussion, James Hunt raised the possibility of > > periodically checking for runaway processes on Touch, killing those > > consuming 100% CPU while creating a report to be sent to > > https://errors.ubuntu.com.
> > I've summarised the key points of that discussion here into a > > proposal. The hope of this is that it gives everyone a chance to > > provide input. > Is this a proposal for 13.10? I think it's unrealistic to think anything discussed here would land for 13.10. We already have plenty of other things on our plate that are on the critical path for 13.10. :) > I understand the basic reasoning, but wantonly killing system processes > and then hoping that the system will always gracefully recover sounds a > bit risky to me. > Many system service have complex start-up sequences, and although they > *should* handle restarts properly, they may not, potentially leaving the > device in an inconsistent state. Processes handled by upstart should > work, but what about helper processes ( eg. dhclient ), are all > guaranteed to be automatically re-started? If the system does not behave sanely when one of its processes dies unexpectedly, that's a serious bug that we need to fix. Whether or not this particular runaway-killer is implemented, processes may die at any time due to bugs (e.g., SIGSEGV being raised), and the system needs to be robust in the face of such problems. I can't say that there is currently any *guarantee* that this is how all components of the system operate, but it is certainly the case that Ubuntu has been designed with this kind of graceful failure in mind. If we're not confident that this is how the system actually behaves, maybe we should be testing that. Anyway, point taken that we shouldn't deploy something that could cause processes that were previously perfectly reliable to suddenly be killed by some other process which arbitrarily decides they're "misbehaving", thus sending the whole system into turmoil. If we're going to go around killing system processes, we should be sure that the cure isn't worse than the disease. > > == Examples == > > There are a few examples of this problem biting us already. > Two of the three bugs listed below "had" bitten us, but have been fixed. Certainly; any salient examples are going to be bugs we already know about, and thus which are likely to be fixed or in progress. The question I have is: would a monitor/killer for runaway processes have improved our response to these bugs? Would it have resulted in earlier detection? Easier diagnosis? Faster fixing? Would such monitoring tell us about other such bugs that we are currently unaware of and need to be? I'm not convinced that the answer is "yes" to any of these. Obviously, the only way to know if it would tell us about bugs we're unaware of is to try it and see :), but I think the fact that we are currently unaware of them is already a strong indicator that they should not be a high priority, because if they were high-impact they would organically rise to our attention. Cheers, -- Steve Langasek Give me a lever long enough and a Free OS Debian Developer to set it on, and I can move the world. Ubuntu Developer http://www.debian.org/ slanga...@ubuntu.com vor...@debian.org
signature.asc
Description: Digital signature
-- Mailing list: https://launchpad.net/~ubuntu-phone Post to : ubuntu-phone@lists.launchpad.net Unsubscribe : https://launchpad.net/~ubuntu-phone More help : https://help.launchpad.net/ListHelp