On Sat, May 30, 2026 at 12:00 AM Guangshuo Li <[email protected]> wrote: > > commit 2d1f7b65f5de ("dm cache policy smq: fix missing locks in > invalidating cache blocks") added mq->lock around the destructive part of > smq_invalidate_mapping(), but left the e->allocated check outside the > critical section. > > That leaves a check-then-act race. Two concurrent invalidators can both > observe e->allocated as true before either of them takes mq->lock. The > first invalidator that acquires the lock removes the entry from the > queues and hash table and then calls free_entry(), which clears > e->allocated and puts the entry back on the free list. The second > invalidator can then acquire mq->lock and continue with the stale result > of the unlocked check. > > This can corrupt the SMQ queues or hash table by deleting an entry that > is no longer on those structures. It can also hit the allocation check in > free_entry() when the same entry is freed again. > > Move the allocation check under mq->lock so the predicate and the > destructive operations are serialized by the same lock. > > Fixes: 2d1f7b65f5de ("dm cache policy smq: fix missing locks in invalidating > cache blocks") > Signed-off-by: Guangshuo Li <[email protected]> > --- > drivers/md/dm-cache-policy-smq.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/md/dm-cache-policy-smq.c > b/drivers/md/dm-cache-policy-smq.c > index dd77a93fd68d..1ae304c2f573 100644 > --- a/drivers/md/dm-cache-policy-smq.c > +++ b/drivers/md/dm-cache-policy-smq.c > @@ -1590,18 +1590,22 @@ static int smq_invalidate_mapping(struct > dm_cache_policy *p, dm_cblock_t cblock) > struct smq_policy *mq = to_smq_policy(p); > struct entry *e = get_entry(&mq->cache_alloc, from_cblock(cblock)); > unsigned long flags; > - > - if (!e->allocated) > - return -ENODATA; > + int r = 0; > > spin_lock_irqsave(&mq->lock, flags); > + if (!e->allocated) { > + r = -ENODATA; > + goto out; > + } > // FIXME: what if this block has pending background work? > del_queue(mq, e); > h_remove(&mq->table, e); > free_entry(&mq->cache_alloc, e); > + > +out: > spin_unlock_irqrestore(&mq->lock, flags); > > - return 0; > + return r; > } > > static uint32_t smq_get_hint(struct dm_cache_policy *p, dm_cblock_t cblock) > -- > 2.43.0 >
Thanks for the fix. While my previous commit addressed the race between concurrent workers, it doesn't cover the invalidate_cblocks message path, where the race between the message and write invalidation wasn't solved. Moving the allocation check under the lock is the right fix for the remaining gap. I've verified the bug is reproducible. By running a passthrough write and a dmsetup message simultaneously, the check-then-act race on e->allocated leads to a double free of the cache entry, hitting the BUG_ON in free_entry() on the second call. The patch fixes it as expected. On the other hand, the write path itself is safe: the prison cell serialization prevents two workers from invalidating a cache block simultaneously. Reviewed-by: Ming-Hung Tsai <[email protected]>

