On Thu, May 05 2022 at 17:00, Ricardo Neri wrote:
> Add a NMI_WATCHDOG as a new category of NMI handler. This new category
> is to be used with the HPET-based hardlockup detector. This detector
> does not have a direct way of checking if the HPET timer is the source of
> the NMI. Instead, it indirectly estimates it using the time-stamp counter.
>
> Therefore, we may have false-positives in case another NMI occurs within
> the estimated time window. For this reason, we want the handler of the
> detector to be called after all the NMI_LOCAL handlers. A simple way
> of achieving this with a new NMI handler category.
>
> @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs *regs)
>       }
>       raw_spin_unlock(&nmi_reason_lock);
>  
> +     handled = nmi_handle(NMI_WATCHDOG, regs);
> +     if (handled == NMI_HANDLED)
> +             goto out;
> +

How is this supposed to work reliably?

If perf is active and the HPET NMI and the perf NMI come in around the
same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog
won't be checked. Because MSI is strictly edge and the message is only
sent once, this can result in a stale watchdog, no?

Thanks,

        tglx



Reply via email to