On Mon, Mar 03, 2025 at 11:47:11AM -0500, Steven Rostedt wrote: [...] > > [ 92.322347][ T28] register_lock_class+0xb2/0xfc0 > > [ 92.322366][ T28] ? __lock_acquire+0xb97/0x16a0 > > [ 92.322386][ T28] ? __pfx_register_lock_class+0x10/0x10 > > [ 92.322407][ T28] ? do_perf_trace_lock.isra.0+0x10b/0x570 > > [ 92.322427][ T28] __lock_acquire+0xc3/0x16a0 > > [ 92.322446][ T28] ? __pfx___lock_release+0x10/0x10 > > [ 92.322466][ T28] ? rcu_is_watching+0x12/0xd0 > > [ 92.322486][ T28] lock_acquire+0x181/0x3a0 > > [ 92.322505][ T28] ? srcu_gp_start_if_needed+0x1a9/0x5f0 > > [ 92.322522][ T28] ? __pfx_lock_acquire+0x10/0x10 > > [ 92.322541][ T28] ? debug_object_active_state+0x2f1/0x3f0 > > [ 92.322557][ T28] ? do_raw_spin_trylock+0xb4/0x190 > > [ 92.322570][ T28] ? __pfx_do_raw_spin_trylock+0x10/0x10 > > [ 92.322583][ T28] ? __kmalloc_cache_noprof+0x1b9/0x450 > > [ 92.322604][ T28] _raw_spin_trylock+0x76/0xa0 > > [ 92.322619][ T28] ? srcu_gp_start_if_needed+0x1a9/0x5f0 > > [ 92.322636][ T28] srcu_gp_start_if_needed+0x1a9/0x5f0 > > The lock taken is from the passed in rcu_pending pointer. > > > [ 92.322655][ T28] rcu_pending_enqueue+0x686/0xd30 > > [ 92.322676][ T28] ? __pfx_rcu_pending_enqueue+0x10/0x10 > > [ 92.322693][ T28] ? trace_lock_release+0x11a/0x180 > > [ 92.322708][ T28] ? bkey_cached_free+0xa3/0x170 > > [ 92.322725][ T28] ? lock_release+0x13/0x180 > > [ 92.322744][ T28] ? bkey_cached_free+0xa3/0x170 > > [ 92.322760][ T28] bkey_cached_free+0xfd/0x170 > > Which has: > > static void bkey_cached_free(struct btree_key_cache *bc, > struct bkey_cached *ck) > { > kfree(ck->k); > ck->k = NULL; > ck->u64s = 0; > > six_unlock_write(&ck->c.lock); > six_unlock_intent(&ck->c.lock); > > bool pcpu_readers = ck->c.lock.readers != NULL; > rcu_pending_enqueue(&bc->pending[pcpu_readers], &ck->rcu); > this_cpu_inc(*bc->nr_pending); > } > > So if that bc->pending[pcpu_readers] gets corrupted in anyway, that could > trigger this.
True, another thing that could corrupt it is if per-cpu global data section section is corrupted, because the crash is happening in this trylock per the above stack: srcu_gp_start_if_needed -> spin_lock_irqsave_sdp_contention(sdp) -> spin_trylock(sdp->lock) where sdp is ssp->sda and is allocated from per-cpu storage. So corruption of the per-cpu global data section can also trigger this, even if the rcu_pending pointer is intact. thanks, - Joel