On Wed, Nov 12, 2014 at 3:41 PM, Luck, Tony <tony.l...@intel.com> wrote:
>> v2 coming soon with these changes and some additional comment cleanups.
>

v2's not going to make a difference unless you're using uprobes at the
same time.

> So v1 + do_machine_check change is not surviving some real testing.  I'm 
> injecting and
> consuming errors sequentially with a small delay in between - so no fancy 
> corner cases with
> multiple errors being processed ... we get all the way done with one error 
> before we start
> the next.  Test only survives about 400ish recoveries before Linux dies 
> complaining:
>     "Timeout synchronizing machine check over CPUs".
> This probably means that some cpu wandered into the weeds and never showed up 
> in the
> handler.

In the interest of my sanity, can you add something like
BUG_ON(!user_mode_vm(regs)) or the mce_panic equivalent before calling
memory_failure?

What happens if there's a shared bank but the actual offender has a
higher order than the cpu that finds the error?

Is this something I can try under KVM?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to