Excerpts from Nicholas Piggin's message of November 10, 2021 12:50 pm: > @@ -160,11 +187,26 @@ static void watchdog_smp_panic(int cpu, u64 tb) > goto out; > if (cpumask_test_cpu(cpu, &wd_smp_cpus_pending)) > goto out; > - if (cpumask_weight(&wd_smp_cpus_pending) == 0) > + if (!wd_try_report()) > goto out; > + for_each_online_cpu(c) { > + if (!cpumask_test_cpu(c, &wd_smp_cpus_pending)) > + continue; > + if (c == cpu) > + continue; // should not happen > + > + __cpumask_set_cpu(c, &wd_smp_cpus_ipi); > + if (set_cpu_stuck(c, tb)) > + break; > + } > + if (cpumask_empty(&wd_smp_cpus_ipi)) { > + wd_end_reporting(); > + goto out; > + } > + wd_smp_unlock(&flags); > > pr_emerg("CPU %d detected hard LOCKUP on other CPUs %*pbl\n", > - cpu, cpumask_pr_args(&wd_smp_cpus_pending)); > + cpu, cpumask_pr_args(&wd_smp_cpus_ipi)); > pr_emerg("CPU %d TB:%lld, last SMP heartbeat TB:%lld (%lldms ago)\n", > cpu, tb, wd_smp_last_reset_tb, > tb_to_ns(tb - wd_smp_last_reset_tb) / 1000000);
Oops, this has a bug: wd_smp_last_reset_tb gets reset above by set_cpu_stuck when all the stuck CPUs are taken out of the pending mask, so this prints nonsense last-reset times. I might just send out an updated series, because the fix has a slight clash with the next patch. All I do is take a local copy of wd_smp_last_reset_tb near the start of the function. Thanks, Nick