Am 30.05.2025 um 17:10 hat Fiona Ebner geschrieben: > This series is an attempt to fix a deadlock issue reported by Andrey > here [3]. > > bdrv_drained_begin() polls and is not allowed to be called with the > block graph lock held. Mark the function as GRAPH_UNLOCKED. > > This alone does not catch the issue reported by Andrey, because there > is a bdrv_graph_rdunlock_main_loop() before bdrv_drained_begin() in > the function bdrv_change_aio_context(). That unlock is of course > ineffective if the exclusive lock is held, but it prevents TSA from > finding the issue. > > Thus the bdrv_drained_begin() call from inside > bdrv_change_aio_context() needs to be moved up the call stack before > acquiring the locks. This is the bulk of the series. > > Granular draining is not trivially possible, because many of the > affected functions can recursively call themselves. > > In place where bdrv_drained_begin() calls were removed, assertions > are added, checking the quiesced_counter to ensure that the nodes > already got drained further up in the call stack.
I finished review for this series. I had some minor comments on patches 24, 27 and 41. Once we agree what to do there, I can probably just make any changes myself while applying. Kevin