On Mon, Jan 5, 2015 at 4:44 PM, Luck, Tony <tony.l...@intel.com> wrote: > We now switch to the kernel stack when a machine check interrupts > during user mode. This means that we can perform recovery actions > in the tail of do_machine_check() > > Signed-off-by: Tony Luck <tony.l...@intel.com> > > --- > On top of Andy's x86/paranoid branch > Andy: Should I really move that: > pr_err("Uncorrected hardware memory error ... > inside the ist_begin_non_atomic() section? >
I think I like it as is. [...] > @@ -1220,6 +1177,26 @@ void do_machine_check(struct pt_regs *regs, long > error_code) > mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); > out: > sync_core(); > + > + if (recover_paddr == ~0ull) > + goto done; > + > + pr_err("Uncorrected hardware memory error in user-access at %llx", > + recover_paddr); printk is safe from IRQ context, so this should be okay unless we've totally screwed up. And, if we totally screwed up, seeing this before the BUGs in ist_begin_non_atomic would be nice. > + /* > + * We must call memory_failure() here even if the current process is > + * doomed. We still need to mark the page as poisoned and alert any > + * other users of the page. > + */ > + ist_begin_non_atomic(regs); > + local_irq_enable(); > + if (memory_failure(recover_paddr >> PAGE_SHIFT, MCE_VECTOR, flags) < > 0) { > + pr_err("Memory error not recovered"); > + force_sig(SIGBUS, current); > + } > + local_irq_disable(); > + ist_end_non_atomic(); > +done: > ist_exit(regs, prev_state); > } For the context-related bits: Reviewed-by: Andy Lutomirski <l...@amacapital.net> Should I stick this in my -next branch so it can stew? --Andy -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/