On POWER9 DD2.1 and below, it's possible for a paste instruction to cause a Machine Check Exception (MCE) where only DSISR bit 33 is set. This will result in the MCE handler seeing an unknown event, which triggers linux to crash.
We change this by detecting unknown events caused by load/stores in the MCE handler and marking them as handled so that we no longer crash. An MCE that occurs like this is spurious, so we don't need to do anything in terms of servicing it. If there is something that needs to be serviced, the CPU will raise the MCE again with the correct DSISR so that it can be serviced properly. Signed-off-by: Michael Neuling <mi...@neuling.org> Reviewed-by: Nicholas Piggin <npig...@gmail.com -- v3: Simplification and SRR1 check suggestions from Nick v2: update commit message based on Balbir's comments --- arch/powerpc/kernel/mce_power.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index b76ca198e0..e423cf0e43 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -624,5 +624,15 @@ long __machine_check_early_realmode_p8(struct pt_regs *regs) long __machine_check_early_realmode_p9(struct pt_regs *regs) { + /* + * On POWER9 DD2.1 and below, it's possible to get machine + * check caused by a paste instruction where only DSISR bit 33 + * is set. This will result in the MCE handler seeing an + * unknown event and us crashing. Change this to mark as + * handled. + */ + if (SRR1_MC_LOADSTORE(regs->msr) && regs->dsisr == 0x40000000) + return 1; + return mce_handle_error(regs, mce_p9_derror_table, mce_p9_ierror_table); } -- 2.11.0