Especially the combination of iothreads, block jobs and drain tends to lead to hangs currently. This series fixes a few of these bugs, although there are more of them, to be addressed in separate patches.
The primary goal of this series is to fix the scenario from: https://bugzilla.redhat.com/show_bug.cgi?id=1601212 A simplified reproducer of the reported problem looks like this (two concurrent commit block jobs for disks in an iothread): $qemu -qmp stdio \ -object iothread,id=iothread1 \ -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=0x6,iothread=iothread1 \ -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=hd0 \ -device scsi-hd,drive=drive_image1,id=image1,bootindex=1 \ -drive id=drive_image2,if=none,snapshot=off,aio=threads,cache=none,format=qcow2,file=hd1 \ -device scsi-hd,drive=drive_image2,id=image2,bootindex=2 {"execute":"qmp_capabilities"} {"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image1","snapshot-file":"sn1"}} {"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image1","snapshot-file":"sn11"}} {"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image1","snapshot-file":"sn111"}} {"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file":"sn2"}} {"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file":"sn22"}} {"execute":"blockdev-snapshot-sync","arguments":{"device":"drive_image2","snapshot-file":"sn222"}} { "execute": "block-commit", "arguments": { "device": "drive_image2","base":"sn2","backing-file":"sn2","top":"sn22"}} { "execute": "block-commit", "arguments": { "device": "drive_image1","base":"sn1","backing-file":"sn1","top":"sn11"}} {"execute":"quit"} v3: - Patch 3 ('aio-wait: Increase num_waiters even in home thread'): Hoist atomic_inc/dec outside the if [Fam, Paolo] - Patch 10 ('block-backend: Fix potential double blk_delete()'): Assert in blk_unref() that drain doesn't resurrect the BB [Paolo] - Patch 11 ('block-backend: Decrease in_flight only after callback'): Removed bdrv_ref/unref pair [Paolo] - v2 Patch 12 ('mirror: Fix potential use-after-free in active'): Dropped. It just papered over another bug that is fixed later. - v3 Patch 17 ('test-bdrv-drain: Fix outdated comments'): New patch with comment improvements [Max] - v3 Patch 18 ('block: Use a single global AioWait'): v3 Patch 19 ('test-bdrv-drain: Test draining job source child and parent'): New patches to fix an additional hang that was caused by notifying the wrong AioWait v2: - Rebased on top of mreitz/block (including fixes for new bugs: patch 1 and 16) - Patch 12: Added missing bdrv_unref() calls in error path [Fam] Kevin Wolf (19): job: Fix missing locking due to mismerge blockjob: Wake up BDS when job becomes idle aio-wait: Increase num_waiters even in home thread test-bdrv-drain: Drain with block jobs in an I/O thread test-blockjob: Acquire AioContext around job_cancel_sync() job: Use AIO_WAIT_WHILE() in job_finish_sync() test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback block: Add missing locking in bdrv_co_drain_bh_cb() block-backend: Add .drained_poll callback block-backend: Fix potential double blk_delete() block-backend: Decrease in_flight only after callback blockjob: Lie better in child_job_drained_poll() block: Remove aio_poll() in bdrv_drain_poll variants test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level() job: Avoid deadlocks in job_completed_txn_abort() test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort test-bdrv-drain: Fix outdated comments block: Use a single global AioWait test-bdrv-drain: Test draining job source child and parent include/block/aio-wait.h | 17 ++- include/block/block.h | 6 +- include/block/block_int.h | 3 - include/block/blockjob.h | 3 + include/qemu/coroutine.h | 5 + include/qemu/job.h | 12 ++ block.c | 5 - block/block-backend.c | 31 +++-- block/io.c | 30 ++--- blockjob.c | 9 +- job.c | 49 +++++--- tests/test-bdrv-drain.c | 292 +++++++++++++++++++++++++++++++++++++++++++--- tests/test-blockjob.c | 6 + util/aio-wait.c | 11 +- util/qemu-coroutine.c | 5 + 15 files changed, 402 insertions(+), 82 deletions(-) -- 2.13.6