This series is an attempt to fix a deadlock issue reported by Andrey
here [0].

bdrv_drained_begin() polls and is not allowed to be called with the
block graph lock held. Mark the function as GRAPH_UNLOCKED.

This does not catch the issue reported by Andrey, because there
is a bdrv_graph_rdunlock_main_loop() before bdrv_drained_begin() in
the function bdrv_change_aio_context(). That unlock is of course
ineffective if the exclusive lock is held, but it seems to prevent TSA
from finding the issue.

Thus the bdrv_drained_begin() call from inside
bdrv_change_aio_context() needs to be moved up the call stack before
acquiring the locks. This required the most changes, patch 09/11.

Granular draining is not trivially possible, because many of the
affected functions can recursively call themselves.

In place where bdrv_drained_begin() calls were removed, assertions
are added, checking the quiesced_counter to ensure that the nodes
already got drained further up in the call stack.

In bdrv_graph_wrlock() there is a comment that it uses
bdrv_drain_all_begin_nopoll() to make sure that constantly arriving
new I/O doesn't cause starvation. The changes from this series are at
odds with that, but there doesn't seem to be any (new) test failures.

[0]: 
https://lore.kernel.org/qemu-devel/73839c04-7616-407e-b057-80ca69e63...@virtuozzo.com/

I did not CC the maintainers for blkverify, vmdk, etc. because this is
just an RFC and the changes there are mechanical, for adapting some
callers of block layer functions.

Andrey Drobyshev (1):
  iotests/graph-changes-while-io: add test case with removal of lower
    snapshot

Fiona Ebner (10):
  block: remove outdated comments about AioContext locking
  block: move drain outside of read-locked bdrv_reopen_queue_child()
  block/snapshot: move drain outside of read-locked
    bdrv_snapshot_delete()
  block: drain while unlocked in bdrv_reopen_parse_file_or_backing()
  block: move drain outside of read-locked bdrv_inactivate_recurse()
  blockdev: drain while unlocked in internal_snapshot_action()
  blockdev: drain while unlocked in external_snapshot_action()
  block: mark bdrv_drained_begin() as GRAPH_UNLOCKED
  block: move drain out of bdrv_change_aio_context()
  block/graph-lock: add drain flag to bdrv_graph_wr{,un}lock

 block.c                                       | 170 +++++++++++-------
 block/backup.c                                |   4 +-
 block/blklogwrites.c                          |   8 +-
 block/blkverify.c                             |   4 +-
 block/block-backend.c                         |   8 +-
 block/commit.c                                |  16 +-
 block/graph-lock.c                            |  22 ++-
 block/mirror.c                                |  22 +--
 block/qcow2.c                                 |   4 +-
 block/quorum.c                                |   8 +-
 block/replication.c                           |  14 +-
 block/snapshot.c                              |  22 ++-
 block/stream.c                                |  18 +-
 block/vmdk.c                                  |  20 +--
 blockdev.c                                    |  59 ++++--
 blockjob.c                                    |  12 +-
 include/block/block-global-state.h            |   8 +-
 include/block/block-io.h                      |   3 +-
 include/block/graph-lock.h                    |   8 +-
 qemu-img.c                                    |   2 +
 scripts/block-coroutine-wrapper.py            |   4 +-
 .../qemu-iotests/tests/graph-changes-while-io | 101 ++++++++++-
 .../tests/graph-changes-while-io.out          |   4 +-
 tests/unit/test-bdrv-drain.c                  |  40 ++---
 tests/unit/test-bdrv-graph-mod.c              |  20 +--
 25 files changed, 384 insertions(+), 217 deletions(-)

-- 
2.39.5



Reply via email to