On Thu, Jun 18, 2026 at 09:46:04PM +0900, Harry Yoo wrote:
> On 6/18/26 5:40 AM, Paul E. McKenney wrote:
> > On Wed, Jun 17, 2026 at 07:38:16AM +0200, Vlastimil Babka (SUSE) wrote:
> >> On 6/17/26 07:14, Harry Yoo wrote:
> >>> On 6/17/26 2:24 AM, Vlastimil Babka (SUSE) wrote:
> >>>> On 6/15/26 13:06, Harry Yoo (Oracle) wrote:
> >>>>> As suggested by Vlastimil Babka, kfree_rcu_sheaf() can be used
> >>>>> on PREEMPT_RT if we always assume spinning is not allowed on PREEMPT_RT.
> >>>>> This is because local_trylock and spinlock_t are safe to use with
> >>>>> trylock variant as long as the kernel does not spin and the context is
> >>>>> not NMI and not hardirq.
> >>>>>
> >>>>> Now that __kfree_rcu_sheaf() knows how to handle allow_spin = false,
> >>>>> relax the limitation and try the sheaves path on PREEMPT_RT as well.
> >>>>>
> >>>>> Keep the lockdep map on non RT kernels. However, do not use the lockdep
> >>>>> map on PREEMPT_RT to avoid suppressing valid lockdep warnings.
> >>>>>
> >>>>> Link:
> >>>>> https://lore.kernel.org/linux-mm/[email protected]
> >>>>> Suggested-by: Vlastimil Babka (SUSE) <[email protected]>
> >>>>> Signed-off-by: Harry Yoo (Oracle) <[email protected]>
> >>>>
> >>>> LGTM, but maybe unnecessary pessimistic wrt call_rcu() on PREEMPT_RT?
> >>>> I thought (in the Link: above) we'd only need to downgrade allow_spin to
> >>>> false on PREEMPT_RT for handling sheaves movement from/to barn and
> >>>> alloc_empty_sheaf(), but call_rcu() would be safe from kfree_rcu() even
> >>>> on
> >>>> RT?
> >>>
> >>> Indeed. Good point, thanks!
> >>>
> >>> Hmm, but I'm not sure that it's worth the complexity given that
> >>> PREEMPT_RT tries very hard to avoid disabling IRQs...
> >>>
> >>>> Or is the irqs_disabled() condition rare enough so we don't care?
> >>>
> >>> Given that most users don't call kfree_rcu() under raw spinlock or
> >>> IRQs-disabled section on PREEMPT_RT, I think it's okay to keep it as is
> >>> (it's not making things worse, at least) and wait for call_rcu_nolock()?
> >>
> >> Sounds good.
> >>
> >>> On a side note, I don't have much idea on what needs to call kfree_rcu()
> >>> under a raw spinlock, other than set_cpus_allowed_force(), which should
> >>> really be using kfree_nolock() instead of kfree_rcu() once we support
> >>> kmalloc() -> kfree_nolock():
> >>
> >> Looks like the case. Well if the fallback path of kfree_nolock() that is
> >> irq_work_queue() is indeed safe here.
> >>
> >>>> /*
> >>>> * Because this is called with p->pi_lock held, it is not possible
> >>>> * to use kfree() here (when PREEMPT_RT=y), therefore punt to using
> >>>> * kfree_rcu().
> >>>> */
> >>>> kfree_rcu((union cpumask_rcuhead *)ac.user_mask, rcu);
> >>>
> >>> Any thoughts, RCU/RT folks?
>
> Thanks for looking into it, Paul!
>
> Perhaps I'm missing some context here... let me clarify.
>
> > For the call_rcu*() counterparts, I am currently considering making the
> > existing functions check for interrupts disabled, using irq_work_queue()
> > in that case.
>
> Ack.
>
> > I suppose that I could use raise_softirq() in the
> > use_softirq=1 case when in_hardirq().
>
> Ack.
>
> > Either way, the check should be
> > cheap compared to rest of the processing.
>
> Agreed.
>
> > The additional rcu_barrier() work required is of course way down in
>
> Assuming "The additional rcu_barrier() work required" means
> rcu_barrier() now needs to wait for all CPUs to complete irq_work or
> softirq...
What I would instead have rcu_barrier() do is to: (1) pull in the pending
callbacks awaiting irq-work, (2) wait for an RCU grace period (and thus
for any pending irq-work handlers), (3) invoke the underlying call_rcu()
machinery on these callbacks (just as the irq-work handler would,
and also the rcu_core() function if the raise_softirq() optimization
is implemented), and (4) proceed as it currently does, which is the
for_each_possible_cpu(cpu) just before the retry: label.
As in item #1 above would start just after this block of code:
init_completion(&rcu_state.barrier_completion);
atomic_set(&rcu_state.barrier_cpu_count, 2);
raw_spin_unlock_irqrestore(&rcu_state.barrier_lock, flags);
> > the noise compared to acquiring a global mutex.
>
> Are you referring to a specific mutex, or just in general?
A specific mutex, namely the one that rcu_barrier() already acquires:
mutex_lock(&rcu_state.barrier_mutex);
> > In your case, kfree_rcu() can be quite a bit lighter weight, though.
> > So the extra checks might not be lost in the noise.
>
> Assuming "the extra checks" means checking whether interrupts are
> disabled in call_rcu()*...
Yes, exactly.
> It will probably be fine since we invoke call_rcu() only when sheaves
> become full, not once for each object?
For the purposes only of my verbiage starting with "In your case,
kfree_rcu()", I was thinking in terms of these extra checks potentially
also being in kfree_rcu().
> Perhaps slightly off-topic; at some point, though, I think it'd be
> better to teach SLUB to use RCU polling API for RCU sheaves.
>
> e.g.) put sheaves into barn->sheaves_pending instead of invoking
> call_rcu(), and the SLUB alloc slowpath checks if there are sheaves
> past grace period before allocating and refilling sheaves.
That makes a great deal of sense to me!
In fact this one of the types of use case that I was thinking of back
when I was implementing the RCU polling API. ;-)
Thanx, Paul