commit 8d451690 ("watchdog: Fix CPU hotplug regression") cause
an oops or hard lockup when doing

 echo 0 > /proc/sys/kernel/nmi_watchdog
 echo 1 > /proc/sys/kernel/nmi_watchdog

and the kernel is booted with nmi_watchdog=1 (default)

Running laptop-mode-tools and disconnecting/connecting AC power
will cause this to trigger, making it a common failure scenario
on laptops.

Instead of bailing out of watchdog_disable() when !watchdog_enabled
we can initialize the hrtimer regardless of watchdog_enabled status.
This makes it safe to call watchdog_disable() in the nmi_watchdog=0
case, without the negative effect on the enabled => disabled =>
enabled case.

All these tests pass with this patch:
- nmi_watchdog=1
  echo 0 > /proc/sys/kernel/nmi_watchdog
  echo 1 > /proc/sys/kernel/nmi_watchdog

- nmi_watchdog=0
  echo 0 > /sys/devices/system/cpu/cpu1/online

- nmi_watchdog=0
  echo mem > /sys/power/state

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51661

Cc: <sta...@vger.kernel.org> # v3.7
Cc: Norbert Warmuth <nwarm...@t-online.de>
Cc: Joseph Salisbury <joseph.salisb...@canonical.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Signed-off-by: Bjørn Mork <bj...@mork.no>
---
Hello Linus,

This post v3.7-rc8 regression kills any machine with laptop-mode-tools
using default config, making it somewhat critical to get a fix into the
v3.7.x stable series ASAP.  I was hoping for some response from Thomas
or the reporters of the original bug, either verifying that my proposal
is OK or providing a better fix.  But I believe that this cannot wait
much longer.

Please apply.  Thanks,

Bjørn


Patch history:

v3:
  added Bugzilla reference and additional recipients
  rebased on current mainline
v2:
  implemented an alternate workaround for the original problem.
v1:
  plain revert of 8d451690


 kernel/watchdog.c |   11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 997c6a1..75a2ab3 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -344,6 +344,10 @@ static void watchdog_enable(unsigned int cpu)
 {
        struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
 
+       /* kick off the timer for the hardlockup detector */
+       hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+       hrtimer->function = watchdog_timer_fn;
+
        if (!watchdog_enabled) {
                kthread_park(current);
                return;
@@ -352,10 +356,6 @@ static void watchdog_enable(unsigned int cpu)
        /* Enable the perf event */
        watchdog_nmi_enable(cpu);
 
-       /* kick off the timer for the hardlockup detector */
-       hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-       hrtimer->function = watchdog_timer_fn;
-
        /* done here because hrtimer_start can only pin to smp_processor_id() */
        hrtimer_start(hrtimer, ns_to_ktime(sample_period),
                      HRTIMER_MODE_REL_PINNED);
@@ -369,9 +369,6 @@ static void watchdog_disable(unsigned int cpu)
 {
        struct hrtimer *hrtimer = &__raw_get_cpu_var(watchdog_hrtimer);
 
-       if (!watchdog_enabled)
-               return;
-
        watchdog_set_prio(SCHED_NORMAL, 0);
        hrtimer_cancel(hrtimer);
        /* disable the perf event */
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to