On Wed, Nov 01, 2017 at 12:46:00PM -0700, tip-bot for Thomas Gleixner wrote:
> Commit-ID:  1c294733b7b9f712f78d15cfa75ffdea72b79abb
> Gitweb:     
> https://git.kernel.org/tip/1c294733b7b9f712f78d15cfa75ffdea72b79abb
> Author:     Thomas Gleixner <t...@linutronix.de>
> AuthorDate: Tue, 31 Oct 2017 22:32:00 +0100
> Committer:  Thomas Gleixner <t...@linutronix.de>
> CommitDate: Wed, 1 Nov 2017 20:41:27 +0100
> 
> watchdog/harclockup/perf: Revert a33d44843d45 ("watchdog/hardlockup/perf: 
> Simplify deferred event destroy")
> 
> Guenter reported a crash in the watchdog/perf code, which is caused by
> cleanup() and enable() running concurrently. The reason for this is:
> 
> The watchdog functions are serialized via the watchdog_mutex and cpu
> hotplug locking, but the enable of the perf based watchdog happens in
> context of the unpark callback of the smpboot thread. But that unpark
> function is not synchronous inside the locking. The unparking of the thread
> just wakes it up and leaves so there is no guarantee when the thread is
> executing.
> 
> If it starts running _before_ the cleanup happened then it will create a
> event and overwrite the dead event pointer. The new event is then cleaned
> up because the event is marked dead.
> 
>     lock(watchdog_mutex);
>     lockup_detector_reconfigure();
>         cpus_read_lock();
>       stop();
>          park()
>       update();
>       start();
>          unpark()
>       cpus_read_unlock();             thread runs()
>                                         overwrite dead event ptr
>       cleanup();
>         free new event, which is active inside perf....
>     unlock(watchdog_mutex);
> 
> The park side is safe as that actually waits for the thread to reach
> parked state.
> 
> Commit a33d44843d45 removed the protection against this kind of scenario
> under the stupid assumption that the hotplug serialization and the
> watchdog_mutex cover everything. 
> 
> Bring it back.
> 
> Reverts: a33d44843d45 ("watchdog/hardlockup/perf: Simplify deferred event 
> destroy")
> Reported-and-tested-by: Guenter Roeck <li...@roeck-us.net>
> Signed-off-by: Thomas Feels-stupid Gleixner <t...@linutronix.de>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Don Zickus <dzic...@redhat.com>
> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1710312145190.1942@nanos
> 
> ---
>  kernel/watchdog_hld.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
> index 71a62ce..f8db56b 100644
> --- a/kernel/watchdog_hld.c
> +++ b/kernel/watchdog_hld.c
> @@ -21,6 +21,7 @@
>  static DEFINE_PER_CPU(bool, hard_watchdog_warn);
>  static DEFINE_PER_CPU(bool, watchdog_nmi_touch);
>  static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
> +static DEFINE_PER_CPU(struct perf_event *, dead_event);
>  static struct cpumask dead_events_mask;
>  
>  static unsigned long hardlockup_allcpu_dumped;
> @@ -203,6 +204,8 @@ void hardlockup_detector_perf_disable(void)
>  
>       if (event) {
>               perf_event_disable(event);
> +             this_cpu_write(watchdog_ev, NULL);
> +             this_cpu_write(dead_event, event);
>               cpumask_set_cpu(smp_processor_id(), &dead_events_mask);
>               watchdog_cpus--;
>       }
> @@ -218,7 +221,7 @@ void hardlockup_detector_perf_cleanup(void)
>       int cpu;
>  
>       for_each_cpu(cpu, &dead_events_mask) {
> -             struct perf_event *event = per_cpu(watchdog_ev, cpu);
> +             struct perf_event *event = per_cpu(dead_event, cpu);
>  
>               /*
>                * Required because for_each_cpu() reports  unconditionally
> @@ -226,7 +229,7 @@ void hardlockup_detector_perf_cleanup(void)
>                */
>               if (event)
>                       perf_event_release_kernel(event);
> -             per_cpu(watchdog_ev, cpu) = NULL;
> +             per_cpu(dead_event_ev, cpu) = NULL;

Uuhh ... there is an extra _ev which somehow slipped in and doesn't
compile.

Guenter

>       }
>       cpumask_clear(&dead_events_mask);
>  }

Reply via email to