On Thu, May 22, 2014 at 8:51 AM, Borislav Petkov <b...@alien8.de> wrote: > > Regardless, exceptions like MCE cannot be held pending and do pierce the > NMI handler on both.
No, that's fine, if it's a thread-synchronous thing (ie a memory load that causes errors). But for NMI handlers, that is irrelevant: if the NMI code itself gets memory errors, the machine really is dead. Let's face it, we're going to panic and reboot, there's no other real alternative (other than the "just log it, pray, and continue in unstable mode", which is actually a perfectly valid alternative in many cases, since people don't necessarily care deeply and have written their distributed algorithms to not rely on any particular thread too much, and will verify the end results anyway). The problem is literally the non-synchronous things (like another CPU having problems) where things like broadcast will actually turn a non-thread-synchronous thing into problems for other CPU's. Then, a user-mode memory access error (that we *can* recover from, perhaps by killing the process and isolating the page) can turn into a unrecoverable error on another CPU because it got interrupted at a point where it really couldn't afford to be interrupted. It appears Intel is fixing their braindamage. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/