Excerpts from Nicholas Piggin's message of November 10, 2021 12:50 pm:
> @@ -160,11 +187,26 @@ static void watchdog_smp_panic(int cpu, u64 tb)
>               goto out;
>       if (cpumask_test_cpu(cpu, &wd_smp_cpus_pending))
>               goto out;
> -     if (cpumask_weight(&wd_smp_cpus_pending) == 0)
> +     if (!wd_try_report())
>               goto out;
> +     for_each_online_cpu(c) {
> +             if (!cpumask_test_cpu(c, &wd_smp_cpus_pending))
> +                     continue;
> +             if (c == cpu)
> +                     continue; // should not happen
> +
> +             __cpumask_set_cpu(c, &wd_smp_cpus_ipi);
> +             if (set_cpu_stuck(c, tb))
> +                     break;
> +     }
> +     if (cpumask_empty(&wd_smp_cpus_ipi)) {
> +             wd_end_reporting();
> +             goto out;
> +     }
> +     wd_smp_unlock(&flags);
>  
>       pr_emerg("CPU %d detected hard LOCKUP on other CPUs %*pbl\n",
> -              cpu, cpumask_pr_args(&wd_smp_cpus_pending));
> +              cpu, cpumask_pr_args(&wd_smp_cpus_ipi));
>       pr_emerg("CPU %d TB:%lld, last SMP heartbeat TB:%lld (%lldms ago)\n",
>                cpu, tb, wd_smp_last_reset_tb,
>                tb_to_ns(tb - wd_smp_last_reset_tb) / 1000000);

Oops, this has a bug: wd_smp_last_reset_tb gets reset above by
set_cpu_stuck when all the stuck CPUs are taken out of the pending
mask, so this prints nonsense last-reset times.

I might just send out an updated series, because the fix has a slight
clash with the next patch. All I do is take a local copy of
wd_smp_last_reset_tb near the start of the function.

Thanks,
Nick

Reply via email to