When qemu_coroutine_enter is executed in a loop (even QEMU_FOREACH_SAFE), the new routine can modify the list, for example removing an element, causing problem when control is given back to the caller that continues iterating on the same list.
Patch 1 solves the issue in blkdebug_debug_resume by restarting the list walk after every coroutine_enter if list has to be fully iterated. Patches 2,3,4 aim to fix blkdebug_debug_event by gathering all actions that the rules make in a counter and invoking the respective coroutine_yeld only after processing all requests. Patch 5-6 are somewhat independent of the others, patch 5 removes the need of new_state field, and patch 6 adds a lock to protect rules and suspended_reqs; right now everything works because it's protected by the AioContext lock. This is a preparation for the current proposal of removing the AioContext lock and instead using smaller granularity locks to allow multiple iothread execution in the same block device. Signed-off-by: Emanuele Giuseppe Esposito <eespo...@redhat.com> --- v4: * Patch 5 (new): get rid of new_state and instead use a local variable * Patch 6: move the state update inside the same guard lock where the new one is decided, to have a single critical section and avoid use-before-update. Emanuele Giuseppe Esposito (6): blkdebug: refactor removal of a suspended request blkdebug: move post-resume handling to resume_req_by_tag blkdebug: track all actions blkdebug: do not suspend in the middle of QLIST_FOREACH_SAFE block/blkdebug: remove new_state field and instead use a local variable blkdebug: protect rules and suspended_reqs with a lock block/blkdebug.c | 128 +++++++++++++++++++++++++++++++---------------- 1 file changed, 84 insertions(+), 44 deletions(-) -- 2.30.2