On Fri, May 15, 2020 at 12:30:23AM +0200, Frederic Weisbecker wrote: > On Thu, May 14, 2020 at 08:47:07AM -0700, Paul E. McKenney wrote: > > On Thu, May 14, 2020 at 12:45:26AM +0200, Frederic Weisbecker wrote: > > This last seems best to me. The transition from CBLIST_NOT_OFFLOADED > > to CBLIST_OFFLOADING of course needs to be on the CPU in question with > > at least bh disabled. Probably best to be holding rcu_nocb_lock(), > > but that might just be me being overly paranoid. > > So that's in the case of offloading, right? Well, I don't think we'd > need to even disable bh nor lock nocb. We just need the current CPU > to see the local update of cblist->offloaded = CBLIST_OFFLOADING > before the kthread is unparked: > > cblist->offloaded = CBLIST_OFFLOADING; > /* Make sure subsequent softirq lock nocb */ > barrier(); > kthread_unpark(rdp->nocb_cb_thread); > > Now, although that guarantees that nocb_cb will see CBLIST_OFFLOADING > upon unparking, it's not guaranteed that the nocb_gp will see it on its > next round. Ok so eventually you're right, I should indeed lock nocb...
I suspect that our future selves would hate us much less if we held that lock. ;-) > > > > > +static long rcu_nocb_rdp_deoffload(void *arg) > > > > > +{ > > > > > + struct rcu_data *rdp = arg; > > > > > + > > > > > + WARN_ON_ONCE(rdp->cpu != raw_smp_processor_id()); > > > > > + __rcu_nocb_rdp_deoffload(rdp); > > > > > + > > > > > + return 0; > > > > > +} > > > > > > > > For example, is the problem caused by invocations of this > > > > rcu_nocb_rdp_deoffload() function? > > > > > > How so? > > > > It looked to me like it wasn't excluding either rcu_barrier() or CPU > > hotplug. It might also not have been pinning onto the CPU in question, > > but that might just be me misremembering. Then again, I didn't see a > > call to it, so maybe its callers set things up appropriately. > > > > OK, I will bite... What is the purpose of rcu_nocb_rdp_deoffload()? ;-) > > Ah it's called using work_on_cpu() which launch a workqueue on the > target and waits for completion. And that whole thing is protected > inside the barrier mutex and hotplug. Ah! Yet again, color me blind. Thanx, Paul > > Agreed! And I do believe that concurrent callback execution will > > prove better than a possibly indefinite gap in callback execution. > > Mutual agreement! :-) > > Thanks.