On Thu, Jun 18, 2026 at 09:46:04PM +0900, Harry Yoo wrote:
> On 6/18/26 5:40 AM, Paul E. McKenney wrote:
> > On Wed, Jun 17, 2026 at 07:38:16AM +0200, Vlastimil Babka (SUSE) wrote:
> >> On 6/17/26 07:14, Harry Yoo wrote:
> >>> On 6/17/26 2:24 AM, Vlastimil Babka (SUSE) wrote:
> >>>> On 6/15/26 13:06, Harry Yoo (Oracle) wrote:
> >>>>> As suggested by Vlastimil Babka, kfree_rcu_sheaf() can be used
> >>>>> on PREEMPT_RT if we always assume spinning is not allowed on PREEMPT_RT.
> >>>>> This is because local_trylock and spinlock_t are safe to use with
> >>>>> trylock variant as long as the kernel does not spin and the context is
> >>>>> not NMI and not hardirq.
> >>>>>
> >>>>> Now that __kfree_rcu_sheaf() knows how to handle allow_spin = false,
> >>>>> relax the limitation and try the sheaves path on PREEMPT_RT as well.
> >>>>>
> >>>>> Keep the lockdep map on non RT kernels. However, do not use the lockdep
> >>>>> map on PREEMPT_RT to avoid suppressing valid lockdep warnings.
> >>>>>
> >>>>> Link: 
> >>>>> https://lore.kernel.org/linux-mm/[email protected]
> >>>>> Suggested-by: Vlastimil Babka (SUSE) <[email protected]>
> >>>>> Signed-off-by: Harry Yoo (Oracle) <[email protected]>
> >>>>
> >>>> LGTM, but maybe unnecessary pessimistic wrt call_rcu() on PREEMPT_RT?
> >>>> I thought (in the Link: above) we'd only need to downgrade allow_spin to
> >>>> false on PREEMPT_RT for handling sheaves movement from/to barn and
> >>>> alloc_empty_sheaf(), but call_rcu() would be safe from kfree_rcu() even 
> >>>> on
> >>>> RT?
> >>>
> >>> Indeed. Good point, thanks!
> >>>
> >>> Hmm, but I'm not sure that it's worth the complexity given that
> >>> PREEMPT_RT tries very hard to avoid disabling IRQs...
> >>>
> >>>> Or is the irqs_disabled() condition rare enough so we don't care?
> >>>
> >>> Given that most users don't call kfree_rcu() under raw spinlock or
> >>> IRQs-disabled section on PREEMPT_RT, I think it's okay to keep it as is
> >>> (it's not making things worse, at least) and wait for call_rcu_nolock()?
> >>
> >> Sounds good.
> >>
> >>> On a side note, I don't have much idea on what needs to call kfree_rcu()
> >>> under a raw spinlock, other than set_cpus_allowed_force(), which should
> >>> really be using kfree_nolock() instead of kfree_rcu() once we support
> >>> kmalloc() -> kfree_nolock():
> >>
> >> Looks like the case. Well if the fallback path of kfree_nolock() that is
> >> irq_work_queue() is indeed safe here.
> >>
> >>>>  /*
> >>>>   * Because this is called with p->pi_lock held, it is not possible
> >>>>   * to use kfree() here (when PREEMPT_RT=y), therefore punt to using
> >>>>   * kfree_rcu().
> >>>>   */
> >>>>  kfree_rcu((union cpumask_rcuhead *)ac.user_mask, rcu);
> >>>
> >>> Any thoughts, RCU/RT folks? 
> 
> Thanks for looking into it, Paul!
> 
> Perhaps I'm missing some context here... let me clarify.
> 
> > For the call_rcu*() counterparts, I am currently considering making the
> > existing functions check for interrupts disabled, using irq_work_queue()
> > in that case. 
> 
> Ack.
> 
> > I suppose that I could use raise_softirq() in the
> > use_softirq=1 case when in_hardirq(). 
> 
> Ack.
> 
> > Either way, the check should be
> > cheap compared to rest of the processing.
> 
> Agreed.
> 
> > The additional rcu_barrier() work required is of course way down in
> 
> Assuming "The additional rcu_barrier() work required" means
> rcu_barrier() now needs to wait for all CPUs to complete irq_work or
> softirq...

What I would instead have rcu_barrier() do is to: (1) pull in the pending
callbacks awaiting irq-work, (2) wait for an RCU grace period (and thus
for any pending irq-work handlers), (3) invoke the underlying call_rcu()
machinery on these callbacks (just as the irq-work handler would,
and also the rcu_core() function if the raise_softirq() optimization
is implemented), and (4) proceed as it currently does, which is the
for_each_possible_cpu(cpu) just before the retry: label.

As in item #1 above would start just after this block of code:

        init_completion(&rcu_state.barrier_completion);
        atomic_set(&rcu_state.barrier_cpu_count, 2);
        raw_spin_unlock_irqrestore(&rcu_state.barrier_lock, flags);

> > the noise compared to acquiring a global mutex.
> 
> Are you referring to a specific mutex, or just in general?

A specific mutex, namely the one that rcu_barrier() already acquires:

        mutex_lock(&rcu_state.barrier_mutex);

> > In your case, kfree_rcu() can be quite a bit lighter weight, though.
> > So the extra checks might not be lost in the noise.
> 
> Assuming "the extra checks" means checking whether interrupts are
> disabled in call_rcu()*...

Yes, exactly.

> It will probably be fine since we invoke call_rcu() only when sheaves
> become full, not once for each object?

For the purposes only of my verbiage starting with "In your case,
kfree_rcu()", I was thinking in terms of these extra checks potentially
also being in kfree_rcu().

> Perhaps slightly off-topic; at some point, though, I think it'd be
> better to teach SLUB to use RCU polling API for RCU sheaves.
> 
> e.g.) put sheaves into barn->sheaves_pending instead of invoking
> call_rcu(), and the SLUB alloc slowpath checks if there are sheaves
> past grace period before allocating and refilling sheaves.

That makes a great deal of sense to me!

In fact this one of the types of use case that I was thinking of back
when I was implementing the RCU polling API.  ;-)

                                                        Thanx, Paul

Reply via email to