On Tue, Jul 14, 2015 at 12:06:15PM -0500, Eric W. Biederman wrote: > Vivek Goyal <vgo...@redhat.com> writes: > > > On Tue, Jul 14, 2015 at 03:48:33PM +0000, dwal...@fifo99.com wrote: > >> On Tue, Jul 14, 2015 at 11:40:40AM -0400, Vivek Goyal wrote: > >> > On Tue, Jul 14, 2015 at 03:34:30PM +0000, dwal...@fifo99.com wrote: > >> > > On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote: > >> > > > On Tue, Jul 14, 2015 at 01:59:19PM +0000, dwal...@fifo99.com wrote: > >> > > > > On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote: > >> > > > > > dwal...@fifo99.com writes: > >> > > > > > > >> > > > > > > On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman > >> > > > > > > wrote: > >> > > > > > >> Hidehiro Kawai <hidehiro.kawai...@hitachi.com> writes: > >> > > > > > >> > >> > > > > > >> > You can call panic notifiers and kmsg dumpers before kdump > >> > > > > > >> > by > >> > > > > > >> > specifying "crash_kexec_post_notifiers" as a boot parameter. > >> > > > > > >> > However, it doesn't make sense if kdump is not available. > >> > > > > > >> > In that > >> > > > > > >> > case, disable "crash_kexec_post_notifiers" boot parameter > >> > > > > > >> > so that > >> > > > > > >> > you can't change the value of the parameter. > >> > > > > > >> > >> > > > > > >> Nacked-by: "Eric W. Biederman" <ebied...@xmission.com> > >> > > > > > > > >> > > > > > > I think it would make sense if he just replaced "kdump" with > >> > > > > > > "kexec". > >> > > > > > > >> > > > > > It would be less insane, however it still makes no sense as > >> > > > > > without > >> > > > > > kexec on panic support crash_kexec is a noop. So the value of > >> > > > > > the > >> > > > > > seeting makes no difference. > >> > > > > > >> > > > > Can you explain more, I don't really understand what you mean. Are > >> > > > > you suggesting > >> > > > > the whole "crash_kexec_post_notifiers" feature has no value ? > >> > > > > >> > > > Daniel, > >> > > > > >> > > > BTW, why are you using crash_kexec_post_notifiers commandline? Why > >> > > > not > >> > > > without it? > >> > > > >> > > It was explained in the prior thread but to rehash, the notifiers are > >> > > used to do a switch > >> > > over from the crashed machine to another redundant machine. > >> > > >> > So why not detect failure using polling or issue notifications from > >> > second > >> > kernel. > >> > > >> > IOW, expecting that a crashed machine will be able to deliver > >> > notification > >> > reliably is falwed to begin with, IMHO. > >> > >> It's flawed to think you can kexec, but you still do it right ? I've not > >> gotten into > >> the deep details of this switching process, but that's how this interface > >> is used. > > > > Sure. But the deal here is that users of interface know that sometimes it > > can be unreliable. And in the absence of more reliable mechanism, somewhat > > less reliable mechanism is fine. > > > >> > >> > If a machine is failing, there are high chance it can't deliver you the > >> > notification. Detecting that failure suing some kind of polling mechanism > >> > might be more reliable. And it will make even kdump mechanism more > >> > reliable so that it does not have to run panic notifiers after the crash. > >> > >> I think what your suggesting is that my company should change how it's > >> hardware works > >> and that's not really an option for me. This isn't a simple thing like > >> checking over the > >> network if the machine is down or not, this is way more complex hardware > >> design. > > > > That means you are ready to live with an unreliable design. There might be > > cases where notifier does not get run properly and you will not do switch > > despite the fact that OS has failed. I was just trying to nudge you in > > a direction which could be more reliable mechanism. > > Sigh I see some deep confusion going on here. > > The panic notifiers are just that panic notifiers. They have not been > nor should they be tied to kexec. If those notifiers force a switch > over of between machines I fail to see why you would care if it was > kexec or another panic situation that is forcing that switchover.
Hidehiro isn't fixing the failover situation on my side, he's fixing register information collection when crash_kexec_post_notifiers is used. Daniel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/