On 07/12/16 at 02:49am, 河合英宏 / KAWAI,HIDEHIRO wrote: > Hi Dave, > > Thanks for the comments. > > > From: Dave Young [mailto:dyo...@redhat.com] > > Sent: Monday, July 11, 2016 5:35 PM > > > > On 07/05/16 at 08:33pm, Hidehiro Kawai wrote: > > > This patch fixes one of the problems reported by Daniel Walker > > > (https://lkml.org/lkml/2015/6/24/44). > > > > > > If crash_kexec_post_notifiers boot option is specified, other CPUs > > > are stopped by smp_send_stop() instead of machine_crash_shutdown() > > > in crash_kexec() path. This behavior change leads two problems. > > > > > > Problem 1: > > > octeon_generic_shutdown() for MIPS OCTEON assumes that other CPUs are > > > still online and try to stop their watchdog timer. If > > > smp_send_stop() is called before octeon_generic_shutdown(), stopping > > > watchdog timer will fail because other CPUs have been offlined by > > > smp_send_stop(). > > > > > > panic() > > > if crash_kexec_post_notifiers == 1 > > > smp_send_stop() > > > atomic_notifier_call_chain() > > > kmsg_dump() > > > crash_kexec() > > > machine_crash_shutdown() > > > octeon_generic_shutdown() // shutdown watchdog for ONLINE CPUs > > > > > > Problem 2: > > > Most of architectures stop other CPUs in machine_crash_shutdown() > > > path, and they also do something needed for kdump. For example, > > > they save registers, disable virtualization extensions, and so on. > > > However, if smp_send_stop() stops other CPUs before > > > machine_crash_shutdown(), we miss those operations. > > > > > > How do we fix these problems? In the first place, we should stop > > > other CPUs as soon as possible when panic() was called, otherwise > > > other CPUs may wipe out a clue to the cause of the failure. So, we > > > replace smp_send_stop() with more suitable one for kdump. > > > > We have been avoiding extra things in panic path, but unfortunately > > crash_kexec_post_notifiers were added. I tend to agree the best place > > for this stuff is in 2nd kernel or purgatory instead of in 1st kernel. > > Several months ago, I posted a patch set which writes regs to SEL, generate > an event to send SNMP message, and start/stop BMC's watchdog timer in > purgatory. This feature requires BMC with KCS (Keyboard Controller Style) > I/F, but the most of enterprise grade server would have it. > (http://thread.gmane.org/gmane.linux.kernel.kexec/15382) > > Doing kmsg_dump things in purgatory wouldn't be suitable (should be done > in the 2nd kernel before enabling devices and IRQs?)
In theory it is doable maybe do something like oldmem_kmsg_dump while /proc/vmcore initializing? > > > As for this patch I'm not sure it is safe to replace the smp_send_stop > > with the kdump friendly function. I'm also not sure if the kdump friendly > > function is safe for kdump. Will glad to hear opinions from other > > arch experts. > > This stuff depends on architectures, so I speak only about > x86 (the logic doesn't change on other architectures at this time). > > kdump path with crash_kexec_post_notifiers disabled: > panic() > __crash_kexec() > crash_setup_regs() > crash_save_vmcoreinfo() > machine_crash_shutdown() > native_machine_crash_shutdown() > panic_smp_send_stop() /* mostly same as original > * kdump_nmi_shootdown_cpus() > */ > > kdump path with crash_kexec_post_notifiers enabled: > panic() > panic_smp_send_stop() > __crash_kexec() > crash_setup_regs() > crash_save_vmcoreinfo() > machine_crash_shutdown() > native_machine_crash_shutdown() > panic_smp_send_stop() // do nothing > > The difference is that stopping other CPUs before crash_setup_regs() > and crash_save_vmcoreinfo() or not. Since crash_setup_regs() and > crash_save_vmcoreinfo() just save information to some memory area, > they wouldn't be affected by panic_smp_send_stop(). This means > placing panic_smp_send_stop before __crash_kexec is safe. > > BTW, I noticed my patch breaks Xen kernel. I'll fix it in the > next version. But it does breaks stuff which depends on cpu not being disabled like problem 1 you mentioned in patch log. > > > BTW, if one want to use crash_kexec_post_notifiers he should take the > > risk of unreliable kdump. How about only call smp_send_stop in case no > > crash_kexec_post_notifiers being used. > > Unlike panic_smp_send_stop()/kdump_nmi_shootdown_cpus(), smp_send_stop() > for x86 tries to stop other CPUs with normal IPI before issuing NMI IPI. > This would be because NMI IPI has a risk of deadlock. We checked if > the kdump path has a risk of deadlock in the case of NMI panic and fixed > it. But I'm not sure about normal panic path. I agree with that use > smp_send_stop if crash_kexec_post_notifiers or kdump is disabled. What I mean is like below, problem 1 will not exist in this way, but kdump will be unreliable: if (!crash_kexec_post_notifiers) smp_send_stop() > > > > This patch solves Problem 2 by replacing smp_send_stop() in panic() > > > with panic_smp_send_stop(). This is a weak function which calls > > > smp_send_stop(), and architecture dependent code may override this > > > with appropriate one. This patch only provides x86-specific version. > > > > It does not fix the Problem 1, it seem not possible to fix it? > > Problem 1 depends on architectures, and at least it doesn't happen > on x86. I can try to fix the Problem 1 for MIPS, but I can't test it. > Possible solution will be to use an smp_send_stop variant which stop > the CPU without offlining. > > Best regards, > > Hidehiro Kawai > Hitachi, Ltd. Research & Development Group > [snip] Thanks Dave