On Tue, Sep 17, 2019 at 06:54:05AM +0000, Tony W Wang-oc wrote:
> But have a question about below codes:
>       if (mcgstatus & MCG_STATUS_RIPV) {
>               mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>               return true;
>       }
> These seems require all #MC exception errors set MCG_STATUS_RIPV = 1
> in order to skip synchronize which "return true;" actually does for this.
> 
> As Intel SDM show, "Recoverable-not-continuable SRAR Type" errors may
> set MCG_STATUS_RIPV = 0, PCC = 0. When these #MC errors broadcast
> to offline CPU, may cause kernel panic with synchronize timeout (offline
> CPU can't skip synchronize in this case).
> 
> Could "return true;" outside the if-case?
>       if (mcgstatus & MCG_STATUS_RIPV) {
>               mce_wrmsrl(MSR_IA32_MCG_STATUS, 0);
>       } 
>       return true; 

If RIPV bit is not set in mcgstatus, then where will the CPU return
to if you simply return from the #MC handler? RIPV=1 means that the
CPU pushed a valid return instruction pointer onto the stack.

E.g. in the not-continuable case you mention above? The CPU
will likely do something undefined if you try to continue a
not-continuable instruction.

-Tony

Reply via email to