> I've thought about one sneaky option. If we can reliably determine > that we're an innocent bystander of a broadcast #MC, can we send an > IPI-to-self and return without clearing MCIP? Then we get another > interrupt as soon as interrupts are enabled, and we can clear MCIP at > a time when we're definitely not running on the IST stack.
Innocent bystanders have RIPV=1, EIPV=0 in MCG_STATUS ... so they are quite easy to spot. Perhaps we might look at subverting the silly broadcast by just having them immediately clear MCG_STATUS and iret (i.e. not go to do_machine_check() at all). That would require lots of surgery to do_machine_check() and friends - now it wouldn't be sure how many processors to expect to show up. It also opens a different window - once they are back running normal code they might trip another machine check while the victims of the first are still processing - so another "boom, you're dead". The advantage of hitting everyone with the machine check is that it lessens the chance that another will happen as everyone is running looking at a few pages of kernel code & data. The worrying part in that is "as soon as interrupts are enabled". Until we do clear MCIP we're sitting in a mode where another machine check means instant death no saving throw. Nominally better than the "we'll mess the stack up for you" that we are trying to avoid - but the old window is quite short and known to be bounded. The new one might be a lot bigger. -Tony