On Thu, Nov 26, 2020 at 09:25:28PM +0000, Dexuan Cui wrote: > > From: Paul E. McKenney <paul...@kernel.org> > > Sent: Thursday, November 26, 2020 7:47 AM > > ... > > The rcu_segcblist_n_cbs() function returns non-zero because something > > invoked call_rcu() some time previously. The ftrace facility (or just > > a printk) should help you work out where that call_rcu() is located. > > call_rcu() is indeed called multiple times, but as you said, this should > be normal.
Good to know, thank you! > > My best guess is that the underlying bug is that you are invoking > > rcu_barrier() before the RCU grace-period kthread has been created. > > This means that RCU grace periods cannot complete, which in turn means > > that if there has been even one invocation of call_rcu() since boot, > > rcu_barrier() cannot complete, which is what you are in fact seeing. > > Please note that it is perfectly legal to invoke call_rcu() very early in > > the boot process, as in even before the call to rcu_init(). Therefore, > > if this is the case, the bug is the early call to rcu_barrier(), not > > the early calls to call_rcu(). > > > > To check this, at the beginning of rcu_barrier(), check the value of > > rcu_state.gp_kthread. If my guess is correct, it will be NULL. > > Unluckily, it's not NULL here. :-) You can't have everything! ;-) > > Another possibility is that rcu_state.gp_kthread is non-NULL, but that > > something else is preventing RCU grace periods from completing, but in > > It looks like somehow the scheduling is not working here: in rcu_barrier() > , if I replace the wait_for_completion() with > wait_for_completion_timeout(&rcu_state.barrier_completion, 30*HZ), the > issue persists. Have you tried using sysreq-t to see what the various tasks are doing? One way that this can happen is if whatever task is currently running has managed to enter long loop with interrupts disabled. > > that case you should see RCU CPU stall warnings. Unless of course they > > have been disabled. > > Thanx, Paul > > I guess I didn't disable the wanrings (I don't even know how to do that :) Having interrupts disabled on all CPUs would have the effect of disabling the RCU CPU stall warnings. The intended method is in Documentation/admin-guide/kernel-parameters.txt. Search for rcu_cpu_stall_suppress. Not that it seems important at this point. Thanx, Paul > grep RCU .config > # RCU Subsystem > CONFIG_TREE_RCU=y > # CONFIG_RCU_EXPERT is not set > CONFIG_SRCU=y > CONFIG_TREE_SRCU=y > CONFIG_TASKS_RCU_GENERIC=y > CONFIG_TASKS_RUDE_RCU=y > CONFIG_TASKS_TRACE_RCU=y > CONFIG_RCU_STALL_COMMON=y > CONFIG_RCU_NEED_SEGCBLIST=y > CONFIG_RCU_NOCB_CPU=y > # end of RCU Subsystem > CONFIG_MMU_GATHER_RCU_TABLE_FREE=y > # RCU Debugging > # CONFIG_RCU_SCALE_TEST is not set > # CONFIG_RCU_TORTURE_TEST is not set > # CONFIG_RCU_REF_SCALE_TEST is not set > CONFIG_RCU_CPU_STALL_TIMEOUT=30 > CONFIG_RCU_TRACE=y > CONFIG_RCU_EQS_DEBUG=y > # end of RCU Debugging > > Thanks, > -- Dexuan >