Am 30.05.2025 um 17:10 hat Fiona Ebner geschrieben:
> This series is an attempt to fix a deadlock issue reported by Andrey
> here [3].
> 
> bdrv_drained_begin() polls and is not allowed to be called with the
> block graph lock held. Mark the function as GRAPH_UNLOCKED.
> 
> This alone does not catch the issue reported by Andrey, because there
> is a bdrv_graph_rdunlock_main_loop() before bdrv_drained_begin() in
> the function bdrv_change_aio_context(). That unlock is of course
> ineffective if the exclusive lock is held, but it prevents TSA from
> finding the issue.
> 
> Thus the bdrv_drained_begin() call from inside
> bdrv_change_aio_context() needs to be moved up the call stack before
> acquiring the locks. This is the bulk of the series.
> 
> Granular draining is not trivially possible, because many of the
> affected functions can recursively call themselves.
> 
> In place where bdrv_drained_begin() calls were removed, assertions
> are added, checking the quiesced_counter to ensure that the nodes
> already got drained further up in the call stack.

I finished review for this series. I had some minor comments on patches
24, 27 and 41. Once we agree what to do there, I can probably just make
any changes myself while applying.

Kevin


Reply via email to