On Tue, 31 Oct 2017, Guenter Roeck wrote: > On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote: > > [ ...] > > > So we have to revert > > > > a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") > > > > Patch attached. > > > > Tested-by: Guenter Roeck <li...@roeck-us.net> > > There is still a problem. When running > > echo 6 > /proc/sys/kernel/watchdog_thresh > echo 5 > /proc/sys/kernel/watchdog_thresh > > repeatedly, the message > > NMI watchdog: Enabled. Permanently consumes one hw-PMU counter. > > stops after a while (after ~10-30 iterations, with fluctuations). > After adding trace messages into hardlockup_detector_perf_disable() > and hardlockup_detector_perf_enable(), I see: > > hardlockup_detector_perf_disable: disable(0): Number of CPUs: 3 > hardlockup_detector_perf_disable: disable(1): Number of CPUs: 2 > hardlockup_detector_perf_disable: disable(2): Number of CPUs: 1 > hardlockup_detector_perf_disable: disable(3): Number of CPUs: 0 > ... > hardlockup_detector_perf_disable: disable(0): Number of CPUs: 2 > hardlockup_detector_perf_disable: disable(1): Number of CPUs: 1 > hardlockup_detector_perf_disable: disable(2): Number of CPUs: 0 > hardlockup_detector_perf_disable: disable(3): Number of CPUs: -1 > ... > hardlockup_detector_perf_enable: enable(1): Number of CPUs: -6 > hardlockup_detector_perf_enable: enable(3): Number of CPUs: -5 > hardlockup_detector_perf_enable: enable(2): Number of CPUs: -4 > hardlockup_detector_perf_enable: enable(0): Number of CPUs: -3 > > Maybe watchdog_cpus needs to be atomic ?
Indeed. Thanks, tglx