On Tue, May 10, 2022 at 10:14:00PM +1000, Nicholas Piggin wrote: > Excerpts from Ricardo Neri's message of May 6, 2022 10:00 am: > > The HPET-based hardlockup detector relies on the TSC to determine if an > > observed NMI interrupt was originated by HPET timer. Hence, this detector > > can no longer be used with an unstable TSC. > > > > In such case, permanently stop the HPET-based hardlockup detector and > > start the perf-based detector. > > > > Cc: Andi Kleen <a...@linux.intel.com> > > Cc: Stephane Eranian <eran...@google.com> > > Cc: "Ravi V. Shankar" <ravi.v.shan...@intel.com> > > Cc: io...@lists.linux-foundation.org > > Cc: linuxppc-dev@lists.ozlabs.org > > Cc: x...@kernel.org > > Suggested-by: Thomas Gleixner <t...@linutronix.de> > > Reviewed-by: Tony Luck <tony.l...@intel.com> > > Signed-off-by: Ricardo Neri <ricardo.neri-calde...@linux.intel.com> > > --- > > Changes since v5: > > * Relocated the delcaration of hardlockup_detector_switch_to_perf() to > > x86/nmi.h It does not depend on HPET. > > * Removed function stub. The shim hardlockup detector is always for x86. > > > > Changes since v4: > > * Added a stub version of hardlockup_detector_switch_to_perf() for > > !CONFIG_HPET_TIMER. (lkp) > > * Reconfigure the whole lockup detector instead of unconditionally > > starting the perf-based hardlockup detector. > > > > Changes since v3: > > * None > > > > Changes since v2: > > * Introduced this patch. > > > > Changes since v1: > > * N/A > > --- > > arch/x86/include/asm/nmi.h | 6 ++++++ > > arch/x86/kernel/tsc.c | 2 ++ > > arch/x86/kernel/watchdog_hld.c | 6 ++++++ > > 3 files changed, 14 insertions(+) > > > > diff --git a/arch/x86/include/asm/nmi.h b/arch/x86/include/asm/nmi.h > > index 4a0d5b562c91..47752ff67d8b 100644 > > --- a/arch/x86/include/asm/nmi.h > > +++ b/arch/x86/include/asm/nmi.h > > @@ -63,4 +63,10 @@ void stop_nmi(void); > > void restart_nmi(void); > > void local_touch_nmi(void); > > > > +#ifdef CONFIG_X86_HARDLOCKUP_DETECTOR > > +void hardlockup_detector_switch_to_perf(void); > > +#else > > +static inline void hardlockup_detector_switch_to_perf(void) { } > > +#endif > > + > > #endif /* _ASM_X86_NMI_H */ > > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > > index cc1843044d88..74772ffc79d1 100644 > > --- a/arch/x86/kernel/tsc.c > > +++ b/arch/x86/kernel/tsc.c > > @@ -1176,6 +1176,8 @@ void mark_tsc_unstable(char *reason) > > > > clocksource_mark_unstable(&clocksource_tsc_early); > > clocksource_mark_unstable(&clocksource_tsc); > > + > > + hardlockup_detector_switch_to_perf(); > > } > > > > EXPORT_SYMBOL_GPL(mark_tsc_unstable); > > diff --git a/arch/x86/kernel/watchdog_hld.c b/arch/x86/kernel/watchdog_hld.c > > index ef11f0af4ef5..7940977c6312 100644 > > --- a/arch/x86/kernel/watchdog_hld.c > > +++ b/arch/x86/kernel/watchdog_hld.c > > @@ -83,3 +83,9 @@ void watchdog_nmi_start(void) > > if (detector_type == X86_HARDLOCKUP_DETECTOR_HPET) > > hardlockup_detector_hpet_start(); > > } > > + > > +void hardlockup_detector_switch_to_perf(void) > > +{ > > + detector_type = X86_HARDLOCKUP_DETECTOR_PERF; > > Another possible problem along the same lines here, > isn't your watchdog still running at this point? And > it uses detector_type in the switch. > > > + lockup_detector_reconfigure(); > > Actually the detector_type switch is used in some > functions called by lockup_detector_reconfigure() > e.g., watchdog_nmi_stop, so this seems buggy even > without concurrent watchdog.
Yes, this true. I missed this race. > > Is this switching a good idea in general? The admin > has asked for non-standard option because they want > more PMU counterss available and now it eats a > counter potentially causing a problem rather than > detecting one. Agreed. A very valid point. > > I would rather just disable with a warning if it were > up to me. If you *really* wanted to be fancy then > allow admin to re-enable via proc maybe. I think that in either case, /proc/sys/kernel/nmi_watchdog need to be updated to reflect that the NMI watchdog has been disabled. That would require to expose other interfaces of the watchdog. Thanks and BR, Ricardo