Hi, On Thu, Dec 7, 2023 at 5:03 PM Douglas Anderson <diand...@chromium.org> wrote: > > When testing hard lockup handling on my sc7180-trogdor-lazor device > with pseudo-NMI enabled, with serial console enabled and with kgdb > disabled, I found that the stack crawls printed to the serial console > ended up as a jumbled mess. After rebooting, the pstore-based console > looked fine though. Also, enabling kgdb to trap the panic made the > console look fine and avoided the mess. > > After a bit of tracking down, I came to the conclusion that this was > what was happening: > 1. The panic path was stopping all other CPUs with > panic_other_cpus_shutdown(). > 2. At least one of those other CPUs was in the middle of printing to > the serial console and holding the console port's lock, which is > grabbed with "irqsave". ...but since we were stopping with an NMI > we didn't care about the "irqsave" and interrupted anyway. > 3. Since we stopped the CPU while it was holding the lock it would > never release it. > 4. All future calls to output to the console would end up failing to > get the lock in qcom_geni_serial_console_write(). This isn't > _totally_ unexpected at panic time but it's a code path that's not > well tested, hard to get right, and apparently doesn't work > terribly well on the Qualcomm geni serial driver. > > It would probably be a reasonable idea to try to make the Qualcomm > geni serial driver work better, but also it's nice not to get into > this situation in the first place. > > Taking a page from what x86 appears to do in native_stop_other_cpus(), > let's do this: > 1. First, we'll try to stop other CPUs with a normal IPI and wait a > second. This gives them a chance to leave critical sections. > 2. If CPUs fail to stop then we'll retry with an NMI, but give a much > lower timeout since there's no good reason for a CPU not to react > quickly to a NMI. > > This works well and avoids the corrupted console and (presumably) > could help avoid other similar issues. > > In order to do this, we need to do a little re-organization of our > IPIs since we don't have any more free IDs. We'll do what was > suggested in previous conversations and combine "stop" and "crash > stop". That frees up an IPI so now we can have a "stop" and "stop > NMI". > > In order to do this we also need a slight change in the way we keep > track of which CPUs still need to be stopped. We need to know > specifically which CPUs haven't stopped yet when we fall back to NMI > but in the "crash stop" case the "cpu_online_mask" isn't updated as > CPUs go down. This is why that code path had an atomic of the number > of CPUs left. We'll solve this by making the cpumask into a > global. This has a potential memory implication--with NR_CPUs = 4096 > this is 4096/8 = 512 bytes of globals. On the upside in that same case > we take 512 bytes off the stack which could potentially have made the > stop code less reliable. It can be noted that the NMI backtrace code > (lib/nmi_backtrace.c) uses the same approach and that use also > confirms that updating the mask is safe from NMI. > > All of the above lets us combine the logic for "stop" and "crash stop" > code, which appeared to have a bunch of arbitrary implementation > differences. Possibly this could make up for some of the 512 wasted > bytes. ;-) > > Aside from the above change where we try a normal IPI and then an NMI, > the combined function has a few subtle differences: > * In the normal smp_send_stop(), if we fail to stop one or more CPUs > then we won't include the current CPU (the one running > smp_send_stop()) in the error message. > * In crash_smp_send_stop(), if we fail to stop some CPUs we'll print > the CPUs that we failed to stop instead of printing all _but_ the > current running CPU. > * In crash_smp_send_stop(), we will now only print "SMP: stopping > secondary CPUs" if (system_state <= SYSTEM_RUNNING). > > Fixes: d7402513c935 ("arm64: smp: IPI_CPU_STOP and IPI_CPU_CRASH_STOP should > try for NMI") > Signed-off-by: Douglas Anderson <diand...@chromium.org> > --- > I'm not setup to test the crash_smp_send_stop(). I made sure it > compiled and hacked the panic() method to call it, but I haven't > actually run kexec. Hopefully others can confirm that it's working for > them. > > arch/arm64/kernel/smp.c | 115 +++++++++++++++++++--------------------- > 1 file changed, 54 insertions(+), 61 deletions(-) > > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c > index defbab84e9e5..9fe9d4342517 100644 > --- a/arch/arm64/kernel/smp.c > +++ b/arch/arm64/kernel/smp.c > @@ -71,7 +71,7 @@ enum ipi_msg_type { > IPI_RESCHEDULE, > IPI_CALL_FUNC, > IPI_CPU_STOP, > - IPI_CPU_CRASH_STOP, > + IPI_CPU_STOP_NMI, > IPI_TIMER, > IPI_IRQ_WORK, > NR_IPI, > @@ -88,6 +88,9 @@ static int ipi_irq_base __ro_after_init; > static int nr_ipi __ro_after_init = NR_IPI; > static struct irq_desc *ipi_desc[MAX_IPI] __ro_after_init; > > +static DECLARE_BITMAP(stop_mask, NR_CPUS) __read_mostly; > +static bool crash_stop; > + > static void ipi_setup(int cpu); > > #ifdef CONFIG_HOTPLUG_CPU > @@ -770,7 +773,7 @@ static const char *ipi_types[NR_IPI] __tracepoint_string > = { > [IPI_RESCHEDULE] = "Rescheduling interrupts", > [IPI_CALL_FUNC] = "Function call interrupts", > [IPI_CPU_STOP] = "CPU stop interrupts", > - [IPI_CPU_CRASH_STOP] = "CPU stop (for crash dump) interrupts", > + [IPI_CPU_STOP_NMI] = "CPU stop NMIs", > [IPI_TIMER] = "Timer broadcast interrupts", > [IPI_IRQ_WORK] = "IRQ work interrupts", > }; > @@ -831,17 +834,11 @@ void __noreturn panic_smp_self_stop(void) > local_cpu_stop(); > } > > -#ifdef CONFIG_KEXEC_CORE > -static atomic_t waiting_for_crash_ipi = ATOMIC_INIT(0); > -#endif > - > static void __noreturn ipi_cpu_crash_stop(unsigned int cpu, struct pt_regs > *regs) > { > #ifdef CONFIG_KEXEC_CORE > crash_save_cpu(regs, cpu); > > - atomic_dec(&waiting_for_crash_ipi);
Upon reading the patch with fresh eyes, I think I actually need to move the "cpumask_clear_cpu(cpu, to_cpumask(stop_mask))" here. Specifically I think it's important that it happens _after_ the call to crash_save_cpu(). > local_irq_disable(); The above local_irq_disable() is not new for my patch but it seems wonky for two reasons: 1. It feels like it should have been the first thing in the function. 2. It feels like it should be local_daif_mask() instead. I _think_ it doesn't actually matter because, with the current code, we're only ever called from do_handle_IPI() and thus local IRQs will be off (and local NMIs will be off if we're called from NMI context). However, once we have the IRQ + NMI fallback it _might_ matter if we were midway through finally handling the IRQ-based IPI when we decided to try the NMI-based one. For the next spin of the patch I'll plan to get rid of the local_irq_disable() and instead have local_daif_mask() be the first thing that this function does.