On Wed, Nov 12, 2014 at 3:41 PM, Luck, Tony <tony.l...@intel.com> wrote: >> v2 coming soon with these changes and some additional comment cleanups. >
v2's not going to make a difference unless you're using uprobes at the same time. > So v1 + do_machine_check change is not surviving some real testing. I'm > injecting and > consuming errors sequentially with a small delay in between - so no fancy > corner cases with > multiple errors being processed ... we get all the way done with one error > before we start > the next. Test only survives about 400ish recoveries before Linux dies > complaining: > "Timeout synchronizing machine check over CPUs". > This probably means that some cpu wandered into the weeds and never showed up > in the > handler. In the interest of my sanity, can you add something like BUG_ON(!user_mode_vm(regs)) or the mce_panic equivalent before calling memory_failure? What happens if there's a shared bank but the actual offender has a higher order than the cpu that finds the error? Is this something I can try under KVM? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/