Re: [resend-without-rfc] powernv/kdump: Fix cases where the kdump kernel can get HMI's

Nicholas Piggin Fri, 08 Dec 2017 05:21:07 -0800

On Fri, 08 Dec 2017 22:46:49 +1100
Michael Ellerman <m...@ellerman.id.au> wrote:


> Nicholas Piggin <npig...@gmail.com> writes:
> 
> > On Fri,  8 Dec 2017 14:35:33 +1100
> > Balbir Singh <bsinghar...@gmail.com> wrote:
> >  
> >> Certain HMI's such as malfunction error propagate through
> >> all threads/core on the system. If a thread was offline
> >> prior to us crashing the system and jumping to the kdump
> >> kernel, bad things happen when it wakes up due to an HMI
> >> in the kdump kernel.
> >> 
> >> There are several possible ways to solve this problem
> >> 
> >> 1. Put the offline cores in a state such that they are
> >> not woken up for machine check and HMI errors. This
> >> does not work, since we might need to wake up offline
> >> threads occasionally to handle TB errors
> >> 2. Ignore HMI errors, setup HMEER to mask HMI errors,
> >> but this still leads the window open for any MCEs
> >> and masking them for the duration of the dump might
> >> be a concern
> >> 3. Wake up offline CPUs, as in send them to crash_ipi_callback
> >> (not wake them up as in mark them online as seen by
> >> the scheduler). kexec does a wake_online_cpus() call,
> >> this patch does something similar, but instead sends
> >> an IPI and forces them to crash_ipi_callback
> >> 
> >> Care is taken to enable this only for powenv platforms
> >> via crash_wake_offline (a global value set at setup
> >> time). The crash code sends out IPI's to all CPU's
> >> which then move to crash_ipi_callback and kexec_smp_wait().
> >> We don't grab the pt_regs for offline CPU's.
> >> 
> >> Signed-off-by: Balbir Singh <bsinghar...@gmail.com>
> >> ---
> >> 
> >> Nick reviewed the patches and asked if
> >> 
> >> 1. We need to do anything on the otherside of the kernel?
> >> The answer is not clear at this point, but I don't want
> >> to block this patch as it fixes a critical problem with
> >> kdump in SMT=2/1 mode
> >> 2. We should do this for other platforms
> >> The answer is same as above, other platforms require testing
> >> and I can selectively enable them as needed as I test them  
> >
> > Yeah I didn't intend those as a nack for the patch... It's
> > a bit annoying to have these selections between online cpus
> > and present cpus depending on kdump.
> >
> > We don't want to do a full CPU online in the kdump path of
> > course, but what if the crash code has a call that can IPI
> > offline CPUs to get them into the crash callback, rather than
> > put it in the general NMI IPI code?  
> 
> Yeah. I reworked it with Balbir and I think that's pretty much what we
> came up with. The crash code does the normal NMI callback and then goes
> through any offline CPUs and sends them a do-nothing NMI.

Okay good. Well actually it doesn't need to be an NMI because it
comes directly out of stop, so a doorbell would actually be less
code. Anyway NBD.

> 
> >> @@ -187,6 +188,14 @@ static void pnv_smp_cpu_kill_self(void)
> >>            WARN_ON(lazy_irq_pending());
> >>  
> >>            /*
> >> +           * For kdump kernels, we process the ipi and jump to
> >> +           * crash_ipi_callback. For more details see the description
> >> +           * at crash_wake_offline
> >> +           */
> >> +          if (kdump_in_progress())
> >> +                  crash_ipi_callback(NULL);
> >> +
> >> +          /*
> >>             * If the SRR1 value indicates that we woke up due to
> >>             * an external interrupt, then clear the interrupt.
> >>             * We clear the interrupt before checking for the  
> >
> > I think you need to do this _after_ clearing the interrupt,
> > otherwise you get a lost wakeup window, don't you?  
> 
> Yeah I originally thought that, but it depends how you actually do the
> wake up I think.
> 
> On P9 bare metal we will do OPAL_SIGNAL_SYS_RESET. I don't think that
> actually sends an interrupt does it? But that's just from me skimming
> the opal code you wrote :)

It effectively does send an interrupt. Hotplug is in stop, so it
will take a powersave wakeup system reset with system reset reason
in SRR1.

> 
> For the non-NMI case where it's a doorbell or xics/xive IPI yes as
> currently written it will leave the interrupt uncleared.
> 
> On P9 if we don't have the NMI for some reason we should be using
> doorbells, which is nice because it's just the msgclr, ie. we don't have
> to call xive and run all that code.
> 
> On P8 it'll be a doorbell for threads with the same core, and for other
> threads we'll call icp_native_flush_interrupt() which is also nice and
> small and unlikely to break during kdump.

I was worried that you take a superfluous interrupt for some other
reason, then you get past the kdump check, then another CPU crashes
and sends you a dump IPI, then you clear it and lose it.

Re: [resend-without-rfc] powernv/kdump: Fix cases where the kdump kernel can get HMI's

Reply via email to