On Thu, May 05 2022 at 17:00, Ricardo Neri wrote: > Add a NMI_WATCHDOG as a new category of NMI handler. This new category > is to be used with the HPET-based hardlockup detector. This detector > does not have a direct way of checking if the HPET timer is the source of > the NMI. Instead, it indirectly estimates it using the time-stamp counter. > > Therefore, we may have false-positives in case another NMI occurs within > the estimated time window. For this reason, we want the handler of the > detector to be called after all the NMI_LOCAL handlers. A simple way > of achieving this with a new NMI handler category. > > @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs *regs) > } > raw_spin_unlock(&nmi_reason_lock); > > + handled = nmi_handle(NMI_WATCHDOG, regs); > + if (handled == NMI_HANDLED) > + goto out; > +
How is this supposed to work reliably? If perf is active and the HPET NMI and the perf NMI come in around the same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog won't be checked. Because MSI is strictly edge and the message is only sent once, this can result in a stale watchdog, no? Thanks, tglx