Don Zickus <dzic...@redhat.com> writes: > On Wed, Dec 19, 2012 at 08:51:31PM +0100, Bjørn Mork wrote: >> commit 8d451690 ("watchdog: Fix CPU hotplug regression") cause >> an oops or hard lockup when doing >> >> echo 0 > /proc/sys/kernel/nmi_watchdog >> echo 1 > /proc/sys/kernel/nmi_watchdog >> >> and the kernel is booted with nmi_watchdog=1 (default) >> >> Running laptop-mode-tools and disconnecting/connecting AC power >> will cause this to trigger, making it a common failure scenario >> on laptops. >> >> Instead of bailing out of watchdog_disable() when !watchdog_enabled >> we can initialize the hrtimer regardless of watchdog_enabled status. >> This makes it safe to call watchdog_disable() in the nmi_watchdog=0 >> case, without the negative effect on the enabled => disabled => >> enabled case. >> >> All these tests pass with this patch: >> - nmi_watchdog=1 >> echo 0 > /proc/sys/kernel/nmi_watchdog >> echo 1 > /proc/sys/kernel/nmi_watchdog >> >> - nmi_watchdog=0 >> echo 0 > /sys/devices/system/cpu/cpu1/online >> >> - nmi_watchdog=0 >> echo mem > /sys/power/state > > What about the opposite cases? > nmi_watchdog=1 > echo 1 > /sys/devices/system/cpu/cpu1/online
I don't see why not. But verifying it would be nice. I thought that it would be a simple thing to test using qemu-kvm, but it seems that the CPU hotplugging support there isn't quite ready. The guest just dies with "Assertion `bus->allow_hotplug' failed." I'll go digging for alternatives, but if anyone else could verify this then I'd appreciate it. > Are we going to leak memory by re-initing hrtimer? Or is that caught > somewhere? There are no allocations in hrtimer_init() AFAICS? It just initializes the preallocated hrtimer struct. > Other than that, the patch seems reasonable to me. Basically just forcing > the init of hrtimer so there is something to cancel on the disable side. > Then again, I don't fully understand the original problem. I believe the original problem was calling hrtimer_cancel on an uninitialized hrtimer. That is likely to blow up here: static inline void unlock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags) { raw_spin_unlock_irqrestore(&timer->base->cpu_base->lock, *flags); } Note that the lock_hrtimer_base() function protects access to timer->base with a NULL test. That should probably be done in unlock as well, for symmetry? But this is another issue and not at all urgent. Bjørn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/