On Mon, Mar 03, 2025 at 11:47:11AM -0500, Steven Rostedt wrote:
[...]
> > [   92.322347][   T28]  register_lock_class+0xb2/0xfc0
> > [   92.322366][   T28]  ? __lock_acquire+0xb97/0x16a0
> > [   92.322386][   T28]  ? __pfx_register_lock_class+0x10/0x10
> > [   92.322407][   T28]  ? do_perf_trace_lock.isra.0+0x10b/0x570
> > [   92.322427][   T28]  __lock_acquire+0xc3/0x16a0
> > [   92.322446][   T28]  ? __pfx___lock_release+0x10/0x10
> > [   92.322466][   T28]  ? rcu_is_watching+0x12/0xd0
> > [   92.322486][   T28]  lock_acquire+0x181/0x3a0
> > [   92.322505][   T28]  ? srcu_gp_start_if_needed+0x1a9/0x5f0
> > [   92.322522][   T28]  ? __pfx_lock_acquire+0x10/0x10
> > [   92.322541][   T28]  ? debug_object_active_state+0x2f1/0x3f0
> > [   92.322557][   T28]  ? do_raw_spin_trylock+0xb4/0x190
> > [   92.322570][   T28]  ? __pfx_do_raw_spin_trylock+0x10/0x10
> > [   92.322583][   T28]  ? __kmalloc_cache_noprof+0x1b9/0x450
> > [   92.322604][   T28]  _raw_spin_trylock+0x76/0xa0
> > [   92.322619][   T28]  ? srcu_gp_start_if_needed+0x1a9/0x5f0
> > [   92.322636][   T28]  srcu_gp_start_if_needed+0x1a9/0x5f0
> 
> The lock taken is from the passed in rcu_pending pointer.
> 
> > [   92.322655][   T28]  rcu_pending_enqueue+0x686/0xd30
> > [   92.322676][   T28]  ? __pfx_rcu_pending_enqueue+0x10/0x10
> > [   92.322693][   T28]  ? trace_lock_release+0x11a/0x180
> > [   92.322708][   T28]  ? bkey_cached_free+0xa3/0x170
> > [   92.322725][   T28]  ? lock_release+0x13/0x180
> > [   92.322744][   T28]  ? bkey_cached_free+0xa3/0x170
> > [   92.322760][   T28]  bkey_cached_free+0xfd/0x170
> 
> Which has:
> 
> static void bkey_cached_free(struct btree_key_cache *bc,
>                              struct bkey_cached *ck)
> {
>         kfree(ck->k);
>         ck->k           = NULL;
>         ck->u64s        = 0;
>                 
>         six_unlock_write(&ck->c.lock);
>         six_unlock_intent(&ck->c.lock);
> 
>         bool pcpu_readers = ck->c.lock.readers != NULL;
>         rcu_pending_enqueue(&bc->pending[pcpu_readers], &ck->rcu);
>         this_cpu_inc(*bc->nr_pending);
> }
> 
> So if that bc->pending[pcpu_readers] gets corrupted in anyway, that could 
> trigger this.

True, another thing that could corrupt it is if per-cpu global data section
section is corrupted, because the crash is happening in this trylock per the
above stack:

 srcu_gp_start_if_needed ->
        spin_lock_irqsave_sdp_contention(sdp) ->
                spin_trylock(sdp->lock)

        where sdp is ssp->sda and is allocated from per-cpu storage.

So corruption of the per-cpu global data section can also trigger this, even
if the rcu_pending pointer is intact.

thanks,

 - Joel




Reply via email to