On 07.04.2025 04:39, Jarkko Sakkinen wrote: > From: Jarkko Sakkinen <jarkko.sakki...@opinsys.com> > > Add an isolated list of unreferenced keys to be queued for deletion, and > try to pin the keys in the garbage collector before processing anything. > Skip unpinnable keys. > > Use this list for blocking the reaping process during the teardown: > > 1. First off, the keys added to `keys_graveyard` are snapshotted, and the > list is flushed. This the very last step in `key_put()`. > 2. `key_put()` reaches zero. This will mark key as busy for the garbage > collector. > 3. `key_garbage_collector()` will try to increase refcount, which won't go > above zero. Whenever this happens, the key will be skipped. > > Cc: sta...@vger.kernel.org # v6.1+ > Signed-off-by: Jarkko Sakkinen <jarkko.sakki...@opinsys.com>
This patch landed in today's linux-next as commit b0d023797e3e ("keys: Add a list for unreferenced keys"). In my tests I found that it triggers the following lockdep issue: ================================ WARNING: inconsistent lock state 6.15.0-rc1-next-20250407 #15630 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/32 [HC0[0]:SC1[1]:HE1:SE0] takes: c13fdd68 (key_serial_lock){+.?.}-{2:2}, at: key_put+0x74/0x128 {SOFTIRQ-ON-W} state was registered at: lock_acquire+0x134/0x384 _raw_spin_lock+0x38/0x48 key_alloc+0x2fc/0x4d8 keyring_alloc+0x40/0x90 system_trusted_keyring_init+0x50/0x7c do_one_initcall+0x68/0x314 kernel_init_freeable+0x1c0/0x224 kernel_init+0x1c/0x12c ret_from_fork+0x14/0x28 irq event stamp: 234 hardirqs last enabled at (234): [<c0cb7060>] _raw_spin_unlock_irqrestore+0x5c/0x60 hardirqs last disabled at (233): [<c0cb6dd0>] _raw_spin_lock_irqsave+0x64/0x68 softirqs last enabled at (42): [<c013bcd8>] handle_softirqs+0x328/0x520 softirqs last disabled at (47): [<c013bf10>] run_ksoftirqd+0x40/0x68 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(key_serial_lock); <Interrupt> lock(key_serial_lock); *** DEADLOCK *** 1 lock held by ksoftirqd/3/32: #0: c137c040 (rcu_callback){....}-{0:0}, at: rcu_core+0x4d0/0x14a4 stack backtrace: CPU: 3 UID: 0 PID: 32 Comm: ksoftirqd/3 Not tainted 6.15.0-rc1-next-20250407 #15630 PREEMPT Hardware name: Samsung Exynos (Flattened Device Tree) Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x68/0x88 dump_stack_lvl from print_usage_bug.part.0+0x24c/0x270 print_usage_bug.part.0 from mark_lock.part.0+0xc20/0x129c mark_lock.part.0 from __lock_acquire+0xafc/0x2970 __lock_acquire from lock_acquire+0x134/0x384 lock_acquire from _raw_spin_lock+0x38/0x48 _raw_spin_lock from key_put+0x74/0x128 key_put from put_cred_rcu+0x20/0xd0 put_cred_rcu from rcu_core+0x478/0x14a4 rcu_core from handle_softirqs+0x130/0x520 handle_softirqs from run_ksoftirqd+0x40/0x68 run_ksoftirqd from smpboot_thread_fn+0x17c/0x330 smpboot_thread_fn from kthread+0x138/0x25c kthread from ret_from_fork+0x14/0x28 Exception stack(0xf090dfb0 to 0xf090dff8) ... To fix this issue I had to change all calls around key_serial_lock and key_graveyard_lock spinlocks with the irqsave/irqrestore variants (in security/keys/key.c and security/keys/gc.c), but I'm not sure if this is desired solution. > --- > v7: > - Fixed multiple definitions (from rebasing, sorry). > v6: > - Rebase went wrong in v5. > v5: > - Rebased on top of v6.15-rc > - Updated commit message to explain how spin lock and refcount > isolate the time window in key_put(). > v4: > - Pin the key while processing key type teardown. Skip dead keys. > - Revert key_gc_graveyard back key_gc_unused_keys. > - Rewrote the commit message. > - "unsigned long flags" declaration somehow did make to the previous > patch (sorry). > v3: > - Using spin_lock() fails since key_put() is executed inside IRQs. > Using spin_lock_irqsave() would neither work given the lock is > acquired for /proc/keys. Therefore, separate the lock for > graveyard and key_graveyard before reaping key_serial_tree. > v2: > - Rename key_gc_unused_keys as key_gc_graveyard, and re-document the > function. > --- > include/linux/key.h | 7 ++----- > security/keys/gc.c | 40 ++++++++++++++++++++++++---------------- > security/keys/internal.h | 5 +++++ > security/keys/key.c | 7 +++++-- > 4 files changed, 36 insertions(+), 23 deletions(-) > > diff --git a/include/linux/key.h b/include/linux/key.h > index ba05de8579ec..c50659184bdf 100644 > --- a/include/linux/key.h > +++ b/include/linux/key.h > @@ -195,10 +195,8 @@ enum key_state { > struct key { > refcount_t usage; /* number of references */ > key_serial_t serial; /* key serial number */ > - union { > - struct list_head graveyard_link; > - struct rb_node serial_node; > - }; > + struct list_head graveyard_link; /* key->usage == 0 */ > + struct rb_node serial_node; > #ifdef CONFIG_KEY_NOTIFICATIONS > struct watch_list *watchers; /* Entities watching this key > for changes */ > #endif > @@ -236,7 +234,6 @@ struct key { > #define KEY_FLAG_ROOT_CAN_INVAL 7 /* set if key can be > invalidated by root without permission */ > #define KEY_FLAG_KEEP 8 /* set if key should not be > removed */ > #define KEY_FLAG_UID_KEYRING 9 /* set if key is a user or user > session keyring */ > -#define KEY_FLAG_FINAL_PUT 10 /* set if final put has happened on key > */ > > /* the key type and key description string > * - the desc is used to match a key against search criteria > diff --git a/security/keys/gc.c b/security/keys/gc.c > index f27223ea4578..0a3beb68633c 100644 > --- a/security/keys/gc.c > +++ b/security/keys/gc.c > @@ -189,6 +189,7 @@ static void key_garbage_collector(struct work_struct > *work) > struct rb_node *cursor; > struct key *key; > time64_t new_timer, limit, expiry; > + unsigned long flags; > > kenter("[%lx,%x]", key_gc_flags, gc_state); > > @@ -206,21 +207,35 @@ static void key_garbage_collector(struct work_struct > *work) > > new_timer = TIME64_MAX; > > + spin_lock_irqsave(&key_graveyard_lock, flags); > + list_splice_init(&key_graveyard, &graveyard); > + spin_unlock_irqrestore(&key_graveyard_lock, flags); > + > + list_for_each_entry(key, &graveyard, graveyard_link) { > + spin_lock(&key_serial_lock); > + kdebug("unrefd key %d", key->serial); > + rb_erase(&key->serial_node, &key_serial_tree); > + spin_unlock(&key_serial_lock); > + } > + > /* As only this function is permitted to remove things from the key > * serial tree, if cursor is non-NULL then it will always point to a > * valid node in the tree - even if lock got dropped. > */ > spin_lock(&key_serial_lock); > + key = NULL; > cursor = rb_first(&key_serial_tree); > > continue_scanning: > + key_put(key); > while (cursor) { > key = rb_entry(cursor, struct key, serial_node); > cursor = rb_next(cursor); > - > - if (test_bit(KEY_FLAG_FINAL_PUT, &key->flags)) { > - smp_mb(); /* Clobber key->user after FINAL_PUT seen. */ > - goto found_unreferenced_key; > + /* key_get(), unless zero: */ > + if (!refcount_inc_not_zero(&key->usage)) { > + key = NULL; > + gc_state |= KEY_GC_REAP_AGAIN; > + goto skip_dead_key; > } > > if (unlikely(gc_state & KEY_GC_REAPING_DEAD_1)) { > @@ -274,6 +289,7 @@ static void key_garbage_collector(struct work_struct > *work) > spin_lock(&key_serial_lock); > goto continue_scanning; > } > + key_put(key); > > /* We've completed the pass. Set the timer if we need to and queue a > * new cycle if necessary. We keep executing cycles until we find one > @@ -286,6 +302,10 @@ static void key_garbage_collector(struct work_struct > *work) > key_schedule_gc(new_timer); > } > > + spin_lock(&key_graveyard_lock); > + list_splice_init(&key_graveyard, &graveyard); > + spin_unlock(&key_graveyard_lock); > + > if (unlikely(gc_state & KEY_GC_REAPING_DEAD_2) || > !list_empty(&graveyard)) { > /* Make sure that all pending keyring payload destructions are > @@ -328,18 +348,6 @@ static void key_garbage_collector(struct work_struct > *work) > kleave(" [end %x]", gc_state); > return; > > - /* We found an unreferenced key - once we've removed it from the tree, > - * we can safely drop the lock. > - */ > -found_unreferenced_key: > - kdebug("unrefd key %d", key->serial); > - rb_erase(&key->serial_node, &key_serial_tree); > - spin_unlock(&key_serial_lock); > - > - list_add_tail(&key->graveyard_link, &graveyard); > - gc_state |= KEY_GC_REAP_AGAIN; > - goto maybe_resched; > - > /* We found a restricted keyring and need to update the restriction if > * it is associated with the dead key type. > */ > diff --git a/security/keys/internal.h b/security/keys/internal.h > index 2cffa6dc8255..4e3d9b322390 100644 > --- a/security/keys/internal.h > +++ b/security/keys/internal.h > @@ -63,9 +63,14 @@ struct key_user { > int qnbytes; /* number of bytes allocated to > this user */ > }; > > +extern struct list_head key_graveyard; > +extern spinlock_t key_graveyard_lock; > + > extern struct rb_root key_user_tree; > extern spinlock_t key_user_lock; > extern struct key_user root_key_user; > +extern struct list_head key_graveyard; > +extern spinlock_t key_graveyard_lock; > > extern struct key_user *key_user_lookup(kuid_t uid); > extern void key_user_put(struct key_user *user); > diff --git a/security/keys/key.c b/security/keys/key.c > index 7198cd2ac3a3..7511f2017b6b 100644 > --- a/security/keys/key.c > +++ b/security/keys/key.c > @@ -22,6 +22,8 @@ DEFINE_SPINLOCK(key_serial_lock); > > struct rb_root key_user_tree; /* tree of quota records indexed by UID > */ > DEFINE_SPINLOCK(key_user_lock); > +LIST_HEAD(key_graveyard); > +DEFINE_SPINLOCK(key_graveyard_lock); > > unsigned int key_quota_root_maxkeys = 1000000; /* root's key count > quota */ > unsigned int key_quota_root_maxbytes = 25000000; /* root's key space quota > */ > @@ -658,8 +660,9 @@ void key_put(struct key *key) > key->user->qnbytes -= key->quotalen; > spin_unlock_irqrestore(&key->user->lock, flags); > } > - smp_mb(); /* key->user before FINAL_PUT set. */ > - set_bit(KEY_FLAG_FINAL_PUT, &key->flags); > + spin_lock_irqsave(&key_graveyard_lock, flags); > + list_add_tail(&key->graveyard_link, &key_graveyard); > + spin_unlock_irqrestore(&key_graveyard_lock, flags); > schedule_work(&key_gc_work); > } > } Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland