On Tue, Mar 03, 2015 at 09:01:49AM +0000, Naoya Horiguchi wrote: > kexec disables (or "shoots down") all CPUs other than a crashing CPU before > entering the 2nd kernel. But the MCE handler is still enabled after that, > so if MCE happens and broadcasts over the CPUs after the main thread starts > the 2nd kernel (which might not initialize MCE device yet, or might decide > not to enable it,) MCE handler runs only on the other CPUs (not on the main > thread,) leading to kernel panic with MCE synchronization. The user-visible > effect of this bug is kdump failure. > > Our standard MCE handler do_machine_check() assumes some about system's > status and it's hard to alter it to cover kexec/kdump context, so let's add > another kdump-specific one and switch to it. > > Note that this problem exists since current MCE handler was implemented in > 2.6.32, and recently commit 716079f66eac ("mce: Panic when a core has reached > a timeout") made it more visible by changing the default behavior of the > synchronization timeout from "ignore" to "panic". > > Signed-off-by: Naoya Horiguchi <n-horigu...@ah.jp.nec.com> > Cc: <sta...@vger.kernel.org> [2.6.32+] > --- > ChangeLog v2 -> v3 > - go to "switch MCE handler" approach > > ChangeLog v1 -> v2 > - clear MSR_IA32_MCG_CTL, MSR_IA32_MCx_CTL, and CR4.MCE instead of using > global flag to ignore MCE events. > - fixed the description of the problem > --- > arch/x86/include/asm/mce.h | 6 +++++ > arch/x86/kernel/cpu/mcheck/mce.c | 47 > ++++++++++++++++++++++++++++++++++++++++ > arch/x86/kernel/crash.c | 3 +++ > 3 files changed, 56 insertions(+) > > diff --git v3.19.orig/arch/x86/include/asm/mce.h > v3.19/arch/x86/include/asm/mce.h > index 51b26e895933..8010d4b77183 100644 > --- v3.19.orig/arch/x86/include/asm/mce.h > +++ v3.19/arch/x86/include/asm/mce.h > @@ -114,6 +114,9 @@ struct mca_config { > int monarch_timeout; > int panic_timeout; > u32 rip_msr; > +#ifdef CONFIG_KEXEC > + int kdump_cpu; > +#endif
This CONFIG_KEXEC-ifdeffery is too ugly to live. Please put everything in arch/x86/kernel/crash.c. AFAICT, you don't need to touch anything in arch/x86/kernel/cpu/mcheck/ for your purposes. Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/