On Nov 12, 2014 2:30 AM, "Borislav Petkov" <b...@alien8.de> wrote: > > On Tue, Nov 11, 2014 at 06:06:48PM -0800, Tony Luck wrote: > > > Innocent bystanders have RIPV=1, EIPV=0 in MCG_STATUS ... so they > > > are quite easy to spot. > > > > Bother ... except for the SRAO cases where *everyone* is an innocent > > bystander - but someone should go look for the error and queue up > > a page offline event. Perhaps for this we'd do the self-ipi trick and > > then scan the banks to look for SRAO signatures to process as well > > as clearing MCIP > > First of all, good morning :-) > > So I'm reading all those notes again and I can't say I'm convinced about > the change. And all those "but but, what might happen if" cases are what > bother me. > > First, the stack: switching to a possibly almost full kernel stack > where we have other stuff already allocated is a bad idea IMO. There's > a good reason to have a known-good, solely-for-our-purpose stack which > we get to use. It is always there and it gives us the optimal conditions > possible for recovery. Switching back to the possibly crowded kernel > stack and causing stack overflows while handling an MCE is not an > improvement in my book.
I only switch stacks on entry from userspace, and the kernel stack is completely empty if that happens. > > Tony: > > The current code has an ugly hole at the moment. End of > > do_machine_check() clears MCG_STATUS. At that point we are still > > running on the magic stack for machine check exceptions ... if we take > > a machine check in the small window from there until we get off this > > stack (iret) ... then we will enter the handler back on the same stack > > that we haven't finished using yet. Bad things ensue. > > I'm not crazy about it either but how often does this actually happen in > practice? And if it does happen, then, well, we die, so be it. It can > happen with other MCEs coming in back-to-back like poisoned data (1st > MCE) being consumed on another core (2nd MCE). I know, Intel does the > broadcasting but that is going to change, I hear and AMD raises an MCE > on the core which detects it only. > > Another big issue I see with this is verifying the locking works and > generally testing this serious change. MCA is core kernel infrastructure > and we cannot break this left and right. And changing stacks we're > running on and the whole dynamics of the code might open a whole other > can of worms. One nice thing for testing is that my patch applies to int3 from userspace as well, and that's easy to test. > > Oh, and we shouldn't forget why we're doing this: just to save two > members to task_struct and getting rid of paranoid_userspace. Yep, still > not convinced the effort is worth it. I think I want to make this change anyway, though, since it may simplify fsgsbase support enough to justify it solely on that account. I don't think that the machine check code needs to change at all to accommodate a stack switch, but I think it makes some simplifications possible. > > Now, I think Andy had a much better/simpler/cleaner suggestion > yesterday: task_work_add(). It is basically what _TIF_MCE_NOTIFY does > and it is called in the same path - do_notify_resume() - but generic and > we can drop _TIF_MCE_NOTIFY then and use the generic thing. > > Much less intrusive change. Less intrusive is certainly true. --Andy > > -- > Regards/Gruss, > Boris. > > Sent from a fat crate under my desk. Formatting is fine. > -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/