On Wed, Jul 30, 2025 at 2:50 AM K Prateek Nayak <kprateek.na...@amd.com> wrote: > On 7/30/2025 1:57 PM, Maarten Lankhorst wrote: > > Hey, > > > > This warning is introduced in linux-next as a4f0b6fef4b0 ("locking/mutex: > > Add p->blocked_on wrappers for correctness checks") > > Adding relevant people from that commit. > > ... > >> ------------[ cut here ]------------ > >> WARNING: ./include/linux/sched.h:2173 at __clear_task_blocked_on > >> include/linux/sched.h:2173 [inline], CPU#1: syz.1.8698/395 > >> WARNING: ./include/linux/sched.h:2173 at __ww_mutex_wound+0x21a/0x2b0 > >> kernel/locking/ww_mutex.h:346, CPU#1: syz.1.8698/395 > >> Modules linked in: > >> CPU: 1 UID: 0 PID: 395 Comm: syz.1.8698 Not tainted > >> 6.16.0-rc6-next-20250718-syzkaller #0 PREEMPT(full) > >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS > >> Google 07/12/2025 > >> RIP: 0010:__clear_task_blocked_on include/linux/sched.h:2173 [inline] > >> RIP: 0010:__ww_mutex_wound+0x21a/0x2b0 kernel/locking/ww_mutex.h:346 > > When wounding the lock owner, could it be possible that the lock > owner is blocked on a different nested lock? Lock owner implies it > is not blocked on the current lock we are trying to wound right? > > I remember John mentioning seeing circular chains in find_proxy_task() > which required this but looking at this call-chain I'm wondering if > only the __ww_mutex_check_waiters() (or some other path) requires > __clear_task_blocked_on() for that case.
So yeah, I have tripped over this a few times (fixing and often later re-introducing the problem) but usually later in my full proxy-exec series, and somehow missed that the single-rq hit this. Obviously with __ww_mutex_die() we are clearing the blocked on relationship for the lock waiter, but in __ww_mutex_wound() we are waking the lock *owner*, who might be waiting on a different lock, so passing the held lock to the clear_task_blocked_on() checks trips these warnings. Passing NULL instead of lock is the right call here, I'll just need to loosen the __clear_task_blocked_on() check for null as well. I'll spin up a quick patch. thanks -john