Hi Thomas, On Tue, Sep 17, 2019 at 08:37:10AM +0200, Thomas Gleixner wrote: > > microode updates should be of 3 types. > > > > - Only loadable from BIOS (Only via FIT tables) > > - Suitable for early load (things that take cpuid bits for e.g.) > > - Suitable for late-load. (Where no cpuid bits should change etc). > > > > Today the way we load after a stop_machine() all threads in the system are > > held hostage until all the cores have done the update. The thread sibling > > is also in the rendezvous loop. > > I know. See below. > > > Do you think we still have that risk with a sibling thread? > > (Assuming future ucodes don't do weird things like what happened in > > that case where a cpuid was removed via an update) > > Well, yes. The sibling executes a limited set of instructions in a loop, > but it might be hit by an NMI or MCE which executes even more instructions.
There is a plan to solve the NMI issue. Although there is one case we might be showing as a spurious that might not be nice. If #MCE's showup there is nothing we can do at that point. These are most likely unrecoverable. But we want to make sure we could atleast follow through with a proper reset. Let me gather my thoughts on that when i have the patch ready to handle those senarios. > > So what happens if the ucode update "fixes" one of the executed > instructions on the fly? Is that guaranteed to be safe? There is nothing > which says so. > > A decade ago I experimented with putting the spinning CPUs into MWAIT, > which caused havoc. Did neither have time nor the stomach to dig into that > further, but the ucode update _did_ fix an issue with MWAIT according to > the version history. Excellent point. > > That's why I'm worried about instructions being "fixed" which are executed > in parallel on the sibling. > > An authorative statement vs. that would be appreciated. Preferrably in form > of an extension of the SDM, but an upfront statement in this thread would > be a good start. I have started the conversation internally. Once we have something solid I'll share in the list, and also follow up with updates to SDM. Cheers, Ashok