On Tue, Sep 17, 2019 at 06:54:05AM +0000, Tony W Wang-oc wrote: > But have a question about below codes: > if (mcgstatus & MCG_STATUS_RIPV) { > mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); > return true; > } > These seems require all #MC exception errors set MCG_STATUS_RIPV = 1 > in order to skip synchronize which "return true;" actually does for this. > > As Intel SDM show, "Recoverable-not-continuable SRAR Type" errors may > set MCG_STATUS_RIPV = 0, PCC = 0. When these #MC errors broadcast > to offline CPU, may cause kernel panic with synchronize timeout (offline > CPU can't skip synchronize in this case). > > Could "return true;" outside the if-case? > if (mcgstatus & MCG_STATUS_RIPV) { > mce_wrmsrl(MSR_IA32_MCG_STATUS, 0); > } > return true;
If RIPV bit is not set in mcgstatus, then where will the CPU return to if you simply return from the #MC handler? RIPV=1 means that the CPU pushed a valid return instruction pointer onto the stack. E.g. in the not-continuable case you mention above? The CPU will likely do something undefined if you try to continue a not-continuable instruction. -Tony