> MCE is frankly misdesigned. It's a piece of shit, and any of the > hardware designers that claim that what they do is for system > stability are out to lunch. This is a prime example of what *NOT* to > do, and how you can actually spread what was potentially a localized > and recoverable error, and make it global and unrecoverable.
Latest SDM (version 050 from late February this year) describes how this is going to be fixed. Recoverable machine checks are going to be thread local. But current silicon still has the broadcast behavior ... silicon development pipeline is very long :-( -Tony