On 03/14/2016 09:34 AM, Don Zickus wrote:
On Sat, Mar 12, 2016 at 06:50:26PM -0500, Joshua Hunt wrote:
While working on a script to restore all sysctl params before a series of
tests I found that writing any value into the
/proc/sys/kernel/{nmi_watchdog,soft_watchdog,watchdog,watchdog_thresh}
causes them to call proc_watchdog_update(). Not only that, but when I
wrote to these proc files in a loop I could easily trigger a soft lockup.

[  955.756196] NMI watchdog: enabled on all CPUs, permanently consumes one 
hw-PMU counter.
[  955.765994] NMI watchdog: enabled on all CPUs, permanently consumes one 
hw-PMU counter.
[  955.774619] NMI watchdog: enabled on all CPUs, permanently consumes one 
hw-PMU counter.
[  955.783182] NMI watchdog: enabled on all CPUs, permanently consumes one 
hw-PMU counter.
[  959.788319] NMI watchdog: BUG: soft lockup - CPU#4 stuck for 30s! 
[swapper/4:0]
[  959.788325] NMI watchdog: BUG: soft lockup - CPU#5 stuck for 30s! 
[swapper/5:0]

There doesn't appear to be a reason for doing this work other every time a
write occurs, so only do the work when the values change.

Hi Josh,

Thanks for the patch.  I have no objections to it, but Uli and myself were
interested in the reason for the softlockups.  Uli is going to provide a
test patch to see if his theory is correct.  That way we fix the underlying
issue and then apply your patch on top. Make sense?

Yep. Sounds good. I meant to mention I didn't diagnose the soft-lockup. If you provide a patch I'm happy to test. I can also attempt to debug that part more if needed.

Josh


Cheers,
Don


Signed-off-by: Josh Hunt <joh...@akamai.com>
---
  kernel/watchdog.c |    9 ++++++++-
  1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b3ace6e..9acb29f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -923,6 +923,9 @@ static int proc_watchdog_common(int which, struct ctl_table 
*table, int write,
                 * both lockup detectors are disabled if proc_watchdog_update()
                 * returns an error.
                 */
+               if (old == new)
+                       goto out;
+
                err = proc_watchdog_update();
        }
  out:
@@ -967,7 +970,7 @@ int proc_soft_watchdog(struct ctl_table *table, int write,
  int proc_watchdog_thresh(struct ctl_table *table, int write,
                         void __user *buffer, size_t *lenp, loff_t *ppos)
  {
-       int err, old;
+       int err, old, new;

        get_online_cpus();
        mutex_lock(&watchdog_proc_mutex);
@@ -987,6 +990,10 @@ int proc_watchdog_thresh(struct ctl_table *table, int 
write,
        /*
         * Update the sample period. Restore on failure.
         */
+       new = ACCESS_ONCE(watchdog_thresh);
+       if (old == new)
+               goto out;
+
        set_sample_period();
        err = proc_watchdog_update();
        if (err) {
--
1.7.9.5

Reply via email to