Am 11.03.2026 um 00:08 hat Jorge Merlino geschrieben: > There is a race condition on the value of throttle_timers on the > ThrottleGroupMember structure. Those timers should be protected by the > ThrottleGroup lock but sometimes are read without the lock and the code > expects their value to remain constant. > > In particular, there is an assertion that can be false as the timers can > change between their value is checked and the assertion is run. > > Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3194 > > Signed-off-by: Jorge Merlino <[email protected]>
For context, there was another attempt a while ago to fix this race: https://patchew.org/QEMU/[email protected]/ Unfortunately, neither the author nor Berto follow up on my response, so it went nowhere. As you can see, my conclusion then was, too, that the locking needs to be extended. So in that respect, I agree with your patch. I'm not completely clear on how much code needs to be covered by the lock. Is it okay for throttle_group_restart_queue_entry() to run only after the timer has been rescheduled? Also relevant context: Commit 25b8e4d, which introduced the assertion and stated that the situation shouldn't happen. > This patch fixes a race condition on an assertion on the value of the > ThrottleGroupMember throttle_timers. > > The patch is minimal as it changes only a few lines. It will probably > have to be refactored, maybe removing part of the code of the > throttle_group_restart_queue procedure and duplicating it before the > call. As it is now, this procedure needs to be called with the > ThrottleGroup lock held as it will unlock it during its execution. > > I left it as is now so that the changes are clear for review. As I'm > messing with locks and I'm not an expert on this codebase I'm not sure > if there could be side effects I'm not aware of. Yes, there are clearly things to be improved so that lock/unlock happen in the same function, but let's first figure out what actually needs to be done before we worry about the refactoring. Kevin
