On Thu, 21 Sep 2017 19:57:20 +1000 Michael Neuling <mi...@neuling.org> wrote:
> On Thu, 2017-09-21 at 18:18 +1000, Nicholas Piggin wrote: > > On Thu, 21 Sep 2017 12:04:34 +1000 > > Michael Neuling <mi...@neuling.org> wrote: > > > > > On POWER9 DD2.1 and below, it's possible to get Machine Check > > > Exception (MCE) where only DSISR bit 33 is set. This will result in > > > the linux MCE handler seeing an unknown event, which triggers linux to > > > crash. > > > > > > We change this by detecting unknown events in the MCE handler and > > > marking them as handled so that we no longer crash. We do this only on > > > chip revisions known to have this problem. > > > > > > MCE that occurs like this is spurious, so we don't need to do anything > > > in terms of servicing it. If there is something that needs to be > > > serviced, the CPU will raise the MCE again with the correct DSISR so > > > that it can be serviced properly. > > > > > > Signed-off-by: Michael Neuling <mi...@neuling.org> > > > --- > > > v2 update commit message based on Balbir's comments > > > --- > > > arch/powerpc/kernel/mce_power.c | 15 +++++++++++++++ > > > 1 file changed, 15 insertions(+) > > > > > > diff --git a/arch/powerpc/kernel/mce_power.c > > > b/arch/powerpc/kernel/mce_power.c > > > index b76ca198e0..72ec667136 100644 > > > --- a/arch/powerpc/kernel/mce_power.c > > > +++ b/arch/powerpc/kernel/mce_power.c > > > @@ -595,6 +595,7 @@ static long mce_handle_error(struct pt_regs *regs, > > > uint64_t addr; > > > uint64_t srr1 = regs->msr; > > > long handled; > > > + unsigned long pvr; > > > > > > if (SRR1_MC_LOADSTORE(srr1)) > > > handled = mce_handle_derror(regs, dtable, &mce_err, &addr); > > > @@ -604,6 +605,20 @@ static long mce_handle_error(struct pt_regs *regs, > > > if (!handled && mce_err.error_type == MCE_ERROR_TYPE_UE) > > > handled = mce_handle_ue_error(regs); > > > > > > + /* > > > + * On POWER9 DD2.1 and below, it's possible to get machine > > > + * check where only DSISR bit 33 is set. This will result in > > > + * the MCE handler seeing an unknown event and us crashing. > > > + * Change this to mark as handled on these revisions. > > > + */ > > > + pvr = mfspr(SPRN_PVR); > > > + if (((PVR_VER(pvr) == PVR_POWER9) && > > > + (PVR_CFG(pvr) == 2) && > > > + (PVR_MIN(pvr) <= 1)) || cpu_has_feature(CPU_FTR_POWER9_DD1)) > > > + /* DD2.1 and below */ > > > + if (mce_err.error_type == MCE_ERROR_TYPE_UNKNOWN) > > > + handled = 1; > > > > I might be missing something, but can you just do > > > > if (regs->dsisr == 0x40000000) > > return 1; > > > > In __machine_check_early_realmode_p9() ? > > You're right, thanks. If you leave the PVR and DD1 checks in there, it would be a good reminder for me to convert into a quirk if I can get this version specific quirks stuff going https://marc.info/?l=linuxppc-embedded&m=150597337720114&w=2 Thanks, Nick