Re: perf: fuzzer BUG: KASAN: stack-out-of-bounds in __unwind_start

Paul E. McKenney Tue, 29 Nov 2016 09:36:45 -0800

On Tue, Nov 29, 2016 at 11:17:25AM -0600, Josh Poimboeuf wrote:
> On Tue, Nov 29, 2016 at 08:51:52AM -0800, Paul E. McKenney wrote:
> > On Tue, Nov 29, 2016 at 09:09:17AM -0600, Josh Poimboeuf wrote:
> > > On Tue, Nov 29, 2016 at 06:07:34AM -0800, Paul E. McKenney wrote:
> > > > On Tue, Nov 29, 2016 at 10:16:50AM +0100, Peter Zijlstra wrote:
> > > > > On Mon, Nov 28, 2016 at 11:52:41PM -0600, Josh Poimboeuf wrote:
> > > > > > > We used to do that, but the resulting NMIs were problematic on 
> > > > > > > some
> > > > > > > platforms.  Perhaps things have gotten better?
> > > > > > 
> > > > > > Did a little digging on git blame and found the following commit 
> > > > > > (which
> > > > > > seems to be the cause of the KASAN warning and missing stack dump):
> > > > > > 
> > > > > >   bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks")
> > > > > > 
> > > > > > I presume this commit is still needed because of the NMI printk 
> > > > > > deadlock
> > > > > > issues which were discussed at Kernel Summit.  I guess those issues 
> > > > > > need
> > > > > > to be sorted out before the above commit can be reverted.
> > > > > 
> > > > > so printk should more or less work from NMI, esp. after:
> > > > > 
> > > > >   42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI")
> > > > 
> > > > And of course bc1dce514e9b doesn't revert cleanly, but see hand 
> > > > reversion
> > > > below.  Also, 42a0bb3f7138's commit log calls out MN10300 and Xtensa as
> > > > needing more work.  Has that happened?
> > > 
> > > Petr M, any idea?
> > 
> > My Not-yet-signed-off-by is due to this concern, FWIW.
> 
> I think Petr's replies have addressed that now.
> 
> > > > But I really like the fact that RCU CPU stall warnings dump only those
> > > > stacks that are likely to be involved, and the patch below goes back
> > > > to dumping everyone.  Shouldn't be that hard to fix, though...
> > > 
> > > There's a new trigger_single_cpu_backtrace() function which can be used
> > > for that.
> > 
> > Even better, thank you!  Killed an hour or so of coding, but I must
> > confess that it was a mercy killing.  ;-)
> 
> Ha :-)
> 
> > Much nicer (but completely untested) patch below.
> 
> The kernel/rcu/tree.h changes seem intended for another patch?


Indeed it was, thank you for catching this, fixed.

> Otherwise:
> 
>   Reviewed-by: Josh Poimboeuf <[email protected]>
> 
> Also I think this will fix the KASAN warnings reported by Vince, so you
> might add:
> 
>   Reported-by: Vince Weaver <[email protected]>

Added both of these, thank you!

Updated (but still untested) commit below.


                                                        Thanx, Paul

------------------------------------------------------------------------

commit d3df9bc5fb5d838b049f32a476721eadbc349553
Author: Paul E. McKenney <[email protected]>
Date:   Tue Nov 29 05:49:06 2016 -0800

    rcu: Once again use NMI-based stack traces in stall warnings
    
    This commit is for all intents and purposes a revert of bc1dce514e9b
    ("rcu: Don't use NMIs to dump other CPUs' stacks").  The reason to suppose
    that this can now safely be reverted is the presence of 42a0bb3f7138
    ("printk/nmi: generic solution for safe printk in NMI"), which is said
    to have made NMI-based stack dumps safe.
    
    However, this reversion keeps one nice property of bc1dce514e9b
    ("rcu: Don't use NMIs to dump other CPUs' stacks"), namely that
    only those CPUs blocking the grace period are dumped.  The new
    trigger_single_cpu_backtrace() is used to make this happen, as
    suggested by Josh Poimboeuf.
    
    Reported-by: Vince Weaver <[email protected]>
    Not-yet-signed-off-by: Paul E. McKenney <[email protected]>
    Cc: Petr Mladek <[email protected]>
    Cc: Peter Zijlstra <[email protected]>
    Reviewed-by: Josh Poimboeuf <[email protected]>

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 91a68e4e6671..ba0e4825be9d 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -1396,7 +1396,10 @@ static void rcu_check_gp_kthread_starvation(struct 
rcu_state *rsp)
 }
 
 /*
- * Dump stacks of all tasks running on stalled CPUs.
+ * Dump stacks of all tasks running on stalled CPUs.  First try using
+ * NMIs, but fall back to manual remote stack tracing on architectures
+ * that don't support NMI-based stack dumps.  The NMI-triggered stack
+ * traces are more accurate because they are printed by the target CPU.
  */
 static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 {
@@ -1406,11 +1409,10 @@ static void rcu_dump_cpu_stacks(struct rcu_state *rsp)
 
        rcu_for_each_leaf_node(rsp, rnp) {
                raw_spin_lock_irqsave_rcu_node(rnp, flags);
-               if (rnp->qsmask != 0) {
-                       for_each_leaf_node_possible_cpu(rnp, cpu)
-                               if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
+               for_each_leaf_node_possible_cpu(rnp, cpu)
+                       if (rnp->qsmask & leaf_node_cpu_bit(rnp, cpu))
+                               if (!trigger_single_cpu_backtrace(cpu))
                                        dump_cpu_task(cpu);
-               }
                raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
        }
 }

Re: perf: fuzzer BUG: KASAN: stack-out-of-bounds in __unwind_start

Reply via email to