On Tue, Nov 11, 2014 at 1:36 PM, Borislav Petkov <b...@alien8.de> wrote: > A very big hmmm... > > On Tue, Nov 11, 2014 at 12:56:52PM -0800, Andy Lutomirski wrote: >> This causes all non-NMI kernel entries from userspace to run on the >> normal kernel stack. > > So one of the reasons #MC has its own stack is because we need a > known-good stack in such situations. What if the normal kernel stack is > corrupted too due to a #MC?
I don't see why it would be any more likely for the normal kernel stack to be corrupted due to a hardware issue that interrupted ring 3 code than that the IST stack is corrupted. > >> This means that machine check recovery can happen in non-atomic >> context. It also obviates the need for the paranoid_userspace path. >> >> Borislav has referred to this idea as the tail wagging the dog. I >> think that's okay -- the dog was pretty ugly. > > And I still am not sure about this: so the #MC handler makes implicit > assumptions that while it is running nothing is going to interrupt it > and it can access MCA MSRs. If you switch to process context, another > #MC will preempt it and overwrite MCA MSRs. Which is a no-no. > > So unless I'm missing something - and I probably am - I don't think > we can run #MC handler in process context. #MC is the highest prio > abort-type exception along with processor reset for a reason. > I don't know what, if anything, masks and unmasks #MC, but certainly switching to process context like this patch does will not unmask it. Of course, if you sleep, then all bets are off. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/