Re: Crashes in perf_event_ctx_lock_nested

2017-11-01 Thread Guenter Roeck
On Wed, Nov 01, 2017 at 02:11:27PM -0400, Don Zickus wrote: > > > > Maybe watchdog_cpus needs to be atomic ? > > I switched it to atomic and it solves that problem. The functionality isn't > broken currently, just the informational message. > > Patch attached to try. > Tested-by: Guenter Roec

Re: Crashes in perf_event_ctx_lock_nested

2017-11-01 Thread Thomas Gleixner
On Tue, 31 Oct 2017, Guenter Roeck wrote: > On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote: > > [ ...] > > > So we have to revert > > > > a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") > > > > Patch attached. > > > > Tested-by: Guenter Roeck > >

Re: Crashes in perf_event_ctx_lock_nested

2017-11-01 Thread Don Zickus
On Tue, Oct 31, 2017 at 03:11:07PM -0700, Guenter Roeck wrote: > On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote: > > [ ...] > > > So we have to revert > > > > a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") > > > > Patch attached. > > > > Tested-by

Re: Crashes in perf_event_ctx_lock_nested

2017-11-01 Thread Thomas Gleixner
On Wed, 1 Nov 2017, Peter Zijlstra wrote: > On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote: > > That means we can have the following situation: > > > > lock(watchdog_mutex); > > lockup_detector_reconfigure(); > > cpus_read_lock(); > > stop(); > >park(

Re: Crashes in perf_event_ctx_lock_nested

2017-11-01 Thread Peter Zijlstra
On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote: > That means we can have the following situation: > > lock(watchdog_mutex); > lockup_detector_reconfigure(); > cpus_read_lock(); > stop(); > park() > update(); > start(); > unpark()

Re: Crashes in perf_event_ctx_lock_nested

2017-10-31 Thread Guenter Roeck
On Tue, Oct 31, 2017 at 10:32:00PM +0100, Thomas Gleixner wrote: [ ...] > So we have to revert > > a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event destroy") > > Patch attached. > Tested-by: Guenter Roeck There is still a problem. When running echo 6 > /proc/sys/kernel/wat

Re: Crashes in perf_event_ctx_lock_nested

2017-10-31 Thread Thomas Gleixner
On Tue, 31 Oct 2017, Peter Zijlstra wrote: > On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote: > > I added some logging and a long msleep() in > > hardlockup_detector_perf_cleanup(). > > Here is the result: > > > > [0.274361] NMI watchdog: hardlockup_detector_perf_in

Re: Crashes in perf_event_ctx_lock_nested

2017-10-31 Thread Don Zickus
> > Is Chrome OS, changing the default timeout from 10s to something else? > > That would explain it as a script is executed late in the boot cycle and > > explain the quick restart. > > > > Correct, Chrome OS changes the timeout from 10 to 5 seconds. > > A little experiment suggests that the pr

Re: Crashes in perf_event_ctx_lock_nested

2017-10-31 Thread Guenter Roeck
On Tue, Oct 31, 2017 at 02:50:59PM -0400, Don Zickus wrote: > On Tue, Oct 31, 2017 at 10:16:22AM -0700, Guenter Roeck wrote: > > On Tue, Oct 31, 2017 at 02:48:50PM +0100, Peter Zijlstra wrote: > > > On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote: > > > > I added some logging and a lo

Re: Crashes in perf_event_ctx_lock_nested

2017-10-31 Thread Don Zickus
On Tue, Oct 31, 2017 at 10:16:22AM -0700, Guenter Roeck wrote: > On Tue, Oct 31, 2017 at 02:48:50PM +0100, Peter Zijlstra wrote: > > On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote: > > > I added some logging and a long msleep() in > > > hardlockup_detector_perf_cleanup(). > > > Here

Re: Crashes in perf_event_ctx_lock_nested

2017-10-31 Thread Don Zickus
On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote: > Hi Thomas, > > we are seeing the following crash in v4.14-rc5/rc7 if > CONFIG_HARDLOCKUP_DETECTOR > is enabled. > > [5.908021] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter. > [5.915836] > =

Re: Crashes in perf_event_ctx_lock_nested

2017-10-31 Thread Guenter Roeck
On Tue, Oct 31, 2017 at 02:48:50PM +0100, Peter Zijlstra wrote: > On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote: > > I added some logging and a long msleep() in > > hardlockup_detector_perf_cleanup(). > > Here is the result: > > > > [0.274361] NMI watchdog: hardlo

Re: Crashes in perf_event_ctx_lock_nested

2017-10-31 Thread Peter Zijlstra
On Mon, Oct 30, 2017 at 03:45:12PM -0700, Guenter Roeck wrote: > I added some logging and a long msleep() in > hardlockup_detector_perf_cleanup(). > Here is the result: > > [0.274361] NMI watchdog: hardlockup_detector_perf_init > [0.274915] NMI watchdog: hardlock