Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-02 Thread Borislav Petkov
On Mon, Mar 02, 2015 at 11:50:49AM -0500, Prarit Bhargava wrote: > Unless entering a deep C state kicks an MCE ... which we've seen with flaky > hardware. If that is the case, you'll see the MCE not only when entering kdump. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you re

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-02 Thread Prarit Bhargava
On 03/02/2015 11:32 AM, Borislav Petkov wrote: > On Mon, Mar 02, 2015 at 11:33:33PM +0900, Naoya Horiguchi wrote: >> Yes, CPU offlining is one option to keep other CPUs quiet. I'm not sure why >> current kexec implementation doesn't offline the other CPUs but just doing >> cpu_relax() loop, but m

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-02 Thread Borislav Petkov
On Mon, Mar 02, 2015 at 11:33:33PM +0900, Naoya Horiguchi wrote: > Yes, CPU offlining is one option to keep other CPUs quiet. I'm not sure why > current kexec implementation doesn't offline the other CPUs but just doing > cpu_relax() loop, but my guess is that in some kernel panic situation (like >

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-02 Thread Naoya Horiguchi
On Mon, Mar 02, 2015 at 01:17:01PM +0100, Borislav Petkov wrote: On Mon, Mar 02, 2015 at 02:31:19AM +, Naoya Horiguchi wrote: > And please note that the target of this patch is an MCE when the kernel is > already running on kdump code (so crashing happened *not* because of the MCE). > In that

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-02 Thread Borislav Petkov
On Mon, Mar 02, 2015 at 02:31:19AM +, Naoya Horiguchi wrote: > And please note that the target of this patch is an MCE when the kernel is > already running on kdump code (so crashing happened *not* because of the MCE). > In that case, we can expect that kdump works fine if the MCE hits the "kdu

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-01 Thread Naoya Horiguchi
On Fri, Feb 27, 2015 at 06:27:16PM +, Luck, Tony wrote: > > When CR4.MCE=0b and an MCE happens, it will shutdown the system, at > > least on Intel, according to Tony > > I checked with the architects ... and I was right. If you clear CR4.MCE > you'll still > see the machine check - and you'll

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-03-01 Thread Naoya Horiguchi
On Fri, Feb 27, 2015 at 08:14:47AM -0500, Prarit Bhargava wrote: > On 02/27/2015 07:46 AM, Naoya Horiguchi wrote: > > Hi Prarit, > > > > On Fri, Feb 27, 2015 at 06:09:52AM -0500, Prarit Bhargava wrote: > > ... > >> > @@ -157,6 +160,11 @@ void native_machine_crash_shutdown(struct pt_regs > >> > *r

RE: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-02-27 Thread Luck, Tony
> When CR4.MCE=0b and an MCE happens, it will shutdown the system, at > least on Intel, according to Tony I checked with the architects ... and I was right. If you clear CR4.MCE you'll still see the machine check - and you'll pull the big system reset lever. If you think the other cpus can survi

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-02-27 Thread Prarit Bhargava
On 02/27/2015 07:46 AM, Naoya Horiguchi wrote: > Hi Prarit, > > On Fri, Feb 27, 2015 at 06:09:52AM -0500, Prarit Bhargava wrote: > ... >> > @@ -157,6 +160,11 @@ void native_machine_crash_shutdown(struct pt_regs >> > *regs) >> > /* The kernel is broken so disable interrupts */ >> > loc

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-02-27 Thread Naoya Horiguchi
Hi Prarit, On Fri, Feb 27, 2015 at 06:09:52AM -0500, Prarit Bhargava wrote: ... > @@ -157,6 +160,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs) >/* The kernel is broken so disable interrupts */ >local_irq_disable(); > > + /* > + * We can't expect MCE handling to work a

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-02-27 Thread Borislav Petkov
On Fri, Feb 27, 2015 at 06:09:52AM -0500, Prarit Bhargava wrote: > What if the system is actually having problems with MCE errors -- > which are leading to system panics of some sort. Do you *really* want > the system to continue on at that point? No one said that disabling MCA and doing kdump is

Re: [PATCH v2 1/2] x86: mce: kexec: turn off MCE in kexec

2015-02-27 Thread Prarit Bhargava
On 02/26/2015 11:58 PM, Naoya Horiguchi wrote: > kexec disables (or "shoots down") all CPUs other than a crashing CPU before > entering the 2nd kernel. But the MCE handler is still enabled after that, so > if MCE happens and broadcasts around CPUs after the main thread starts the > 2nd kernel (wh