On Wed, May 21, 2014 at 03:39:11PM -0700, Andy Lutomirski wrote: > But if we get a new MCE in here, it will be an MCE from kernel context > and it's fatal. So, yes, we'll clobber the stack, but we'll never > return (unless tolerant is set to something insane), so who cares?
Remember that machine checks are broadcast. So some other cpu can hit a recoverable machine check in user mode ... but that int#18 goes everywhere. Other cpus are innocent bystanders ... they will see MCG_STATUS.RIPV=1, MCG_STATUS.EIPV=0 and nothing important in any of their machine check banks. But if we are still finishing off processing the previous machine check, this will be a nested one - and BOOM, we are dead. -Tony [If you peer closely at the latest edition of the SDM - you'll see the bits are defined for a non-broadcast model ... e.g. LMCE_S bit in MCG_STATUS .... but currently shipping silicon doesn't use that]