Excerpts from Michael Ellerman's message of February 4, 2021 8:15 pm: > Nicholas Piggin <npig...@gmail.com> writes: >> This moves the common NMI entry and exit code into the interrupt handler >> wrappers. >> >> This changes the behaviour of soft-NMI (watchdog) and HMI interrupts, and >> also MCE interrupts on 64e, by adding missing parts of the NMI entry to >> them. >> >> Signed-off-by: Nicholas Piggin <npig...@gmail.com> >> --- >> arch/powerpc/include/asm/interrupt.h | 28 ++++++++++++++++++++++ >> arch/powerpc/kernel/mce.c | 11 --------- >> arch/powerpc/kernel/traps.c | 35 +++++----------------------- >> arch/powerpc/kernel/watchdog.c | 10 ++++---- >> 4 files changed, 38 insertions(+), 46 deletions(-) > > This is unhappy when injecting SLB multi-hits: > > root@p86-2:~# echo PPC_SLB_MULTIHIT > /sys/kernel/debug/provoke-crash/DIRECT > [ 312.496026][ T1344] kernel BUG at > arch/powerpc/include/asm/interrupt.h:152! > [ 312.496037][ T1344] Oops: Exception in kernel mode, sig: 5 [#1] > [ 312.496045][ T1344] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA > pSeries
pseries hash. Blast! > 147 static inline void interrupt_nmi_exit_prepare(struct pt_regs *regs, > struct interrupt_nmi_state *state) > 148 { > 149 if (!IS_ENABLED(CONFIG_PPC_BOOK3S_64) || > 150 !firmware_has_feature(FW_FEATURE_LPAR) || > 151 radix_enabled() || (mfmsr() & MSR_DR)) > 152 nmi_exit(); > > > So presumably it's: > > #define __nmi_exit() \ > do { \ > BUG_ON(!in_nmi()); \ Yes that would be it, pseries machine check enables MMU half way through so only one side of this triggers. The MSR_DR check is supposed to catch the other NMIs that run with MMU on (perf, watchdog, etc). Suppose it could test TRAP(regs) explicitly although I wonder if we should also do this to keep things balanced diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c index 149cec2212e6..f57ca0c570be 100644 --- a/arch/powerpc/platforms/pseries/ras.c +++ b/arch/powerpc/platforms/pseries/ras.c @@ -719,6 +719,7 @@ static int mce_handle_err_virtmode(struct pt_regs *regs, static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp) { + unsigned long msr; struct pseries_errorlog *pseries_log; struct pseries_mc_errorlog *mce_log = NULL; int disposition = rtas_error_disposition(errp); @@ -747,9 +748,12 @@ static int mce_handle_error(struct pt_regs *regs, struct rtas_error_log *errp) * SLB multihit is done by now. */ out: - mtmsr(mfmsr() | MSR_IR | MSR_DR); + msr = mfmsr(); + mtmsr(msr | MSR_IR | MSR_DR); disposition = mce_handle_err_virtmode(regs, errp, mce_log, disposition); + mtmsr(msr); + return disposition; }