On Tue, Nov 29, 2016 at 05:12:46PM +0100, Petr Mladek wrote: > On Tue 2016-11-29 09:09:17, Josh Poimboeuf wrote: > > On Tue, Nov 29, 2016 at 06:07:34AM -0800, Paul E. McKenney wrote: > > > On Tue, Nov 29, 2016 at 10:16:50AM +0100, Peter Zijlstra wrote: > > > > On Mon, Nov 28, 2016 at 11:52:41PM -0600, Josh Poimboeuf wrote: > > > > > > We used to do that, but the resulting NMIs were problematic on some > > > > > > platforms. Perhaps things have gotten better? > > > > > > > > > > Did a little digging on git blame and found the following commit > > > > > (which > > > > > seems to be the cause of the KASAN warning and missing stack dump): > > > > > > > > > > bc1dce514e9b ("rcu: Don't use NMIs to dump other CPUs' stacks") > > > > > > > > > > I presume this commit is still needed because of the NMI printk > > > > > deadlock > > > > > issues which were discussed at Kernel Summit. I guess those issues > > > > > need > > > > > to be sorted out before the above commit can be reverted. > > > > > > > > so printk should more or less work from NMI, esp. after: > > > > > > > > 42a0bb3f7138 ("printk/nmi: generic solution for safe printk in NMI") > > > > > > And of course bc1dce514e9b doesn't revert cleanly, but see hand reversion > > > below. Also, 42a0bb3f7138's commit log calls out MN10300 and Xtensa as > > > needing more work. Has that happened? > > > > Petr M, any idea? > > These two architectures do not support the safe printk in NMI. But > these architectures also do not implement trigger_all_cpu_backtrace() > and other trigger_*_backtrace() functions. Therefore these functions > return false there. > > In fact, only very few architectures implement trigger_*_backtrace(). > And only few of them use NMI (x86, arm, tile). I have just double > checked that these all use the safe printk in NMI. > > By other words, if trigger_all_cpu_backtrace() or > trigger_single_cpu_backtrace() returns true, it should be NMI safe > and you could use it here.
Good, I will upgrade my commit to Signed-off-by, then. > > > But I really like the fact that RCU CPU stall warnings dump only those > > > stacks that are likely to be involved, and the patch below goes back > > > to dumping everyone. Shouldn't be that hard to fix, though... > > > > There's a new trigger_single_cpu_backtrace() function which can be used > > for that. > > There is newly also trigger_cpumask_backtrace(struct cpumask *mask) > where you could select more CPUs using the mask. If this is of any help. In my experience, there is almost never a large number of CPUs stalling a given RCU grace period. But thank you for letting me know about trigger_cpumask_backtrace(), as it might be useful in the future. Thanx, Paul