On Tue, Feb 16, 2016 at 06:56:12PM +0100, Paolo Bonzini wrote: > So the fine-grained locking series has grown from 2 parts to 3... > > This first part stops where we remove RFifoLock. > > In the previous submission, the bulk of the series took care of > associating an AioContext to a thread, so that aio_poll could run event > handlers only on that thread. This was done to fix a race in bdrv_drain. > There were two disadvantages: > > 1) reporting progress from aio_poll to the main thread was Really Hard. > Despite being relatively sure of the correctness---also thanks to formal > models---it's not the kind of code that I'd lightly donate to posterity. > > 2) the strict dependency between bdrv_drain and a single AioContext > would have been bad for multiqueue. > > Instead, this series does the same trick (do not run dataplane event handlers > from the main thread) but reports progress directly at the BlockDriverState > level. > > To do so, the mechanism to track in-flight requests is changed to a > simple counter. This is more flexible than BdrvTrackedRequest, and lets > the block/io.c code track precisely the initiation and completion of the > requests. In particular, the counter remains nonzero when a bottom half > is required to "really" complete the request. bdrv_drain doesn't rely > anymore on a non-blocking aio_poll to execute those bottom halves. > > It is also more efficient; while BdrvTrackedRequest has to remain > for request serialisation, with this series we can in principle make > BdrvTrackedRequest optional and enable it only when something (automatic > RMW or copy-on-read) requires request serialisation. > > In general this flows nicely, the only snag being patch 5. The problem > here is that mirror is calling bdrv_drain from an AIO callback, which > used to be okay (because bdrv_drain more or less tried to guess when > all AIO callbacks were done) but now causes a deadlock (because bdrv_drain > really checks for AIO callbacks to be complete). To add to the complication, > the bdrv_drain happens far away from the AIO callback, in the coroutine that > the AIO callback enters. The situation here is admittedly underdefined, > and Stefan pointed out that the same solution is found in many other places > in the QEMU block layer. Therefore I think it's acceptable. I'm pointing > it out explicitly though, because (together with patch 8) it is an example > of the issues caused by the new, stricter definition of bdrv_drain. > > Patches 1-7 organize bdrv_drain into separate phases, with a well-defined > order between the BDS and it children. The phases are: 1) disable > throttling; 2) disable io_plug; 3) call bdrv_drain callbacks; 4) wait > for I/O to finish; 5) re-enable io_plug and throttling. Previously, > throttling of throttling and io_plug were handled by bdrv_flush_io_queue, > which was repeatedly called as part of the I/O wait loop. This part of > the series removes bdrv_flush_io_queue. > > Patch 8-11 replace aio_poll with bdrv_drain so that the work done > so far applies to all former callers of aio_poll. > > Patches 12-13 introduce the logic that lets the main thread wait remotely > for an iothread to drain the BDS. Patches 14-16 then achieve the purpose > of this series, removing the long-running AioContext critical section > from iothread_run and removing RFifoLock. > > The series passes check-block.sh with a few fixes applied for pre-existing > bugs: > > vl: fix migration from prelaunch state > migration: fix incorrect memory_global_dirty_log_start outside BQL > qed: fix bdrv_qed_drain > > Paolo > > Paolo Bonzini (16): > block: make bdrv_start_throttled_reqs return void > block: move restarting of throttled reqs to block/throttle-groups.c > block: introduce bdrv_no_throttling_begin/end > block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end > mirror: use bottom half to re-enter coroutine > block: add BDS field to count in-flight requests > block: change drain to look only at one child at a time > blockjob: introduce .drain callback for jobs > block: wait for all pending I/O when doing synchronous requests > nfs: replace aio_poll with bdrv_drain > sheepdog: disable dataplane > aio: introduce aio_context_in_iothread > block: only call aio_poll from iothread > iothread: release AioContext around aio_poll > qemu-thread: introduce QemuRecMutex > aio: convert from RFifoLock to QemuRecMutex > > async.c | 28 +--- > block.c | 4 +- > block/backup.c | 7 + > block/io.c | 281 > +++++++++++++++++++++++++--------------- > block/linux-aio.c | 13 +- > block/mirror.c | 37 +++++- > block/nfs.c | 50 ++++--- > block/qed-table.c | 16 +-- > block/qed.c | 4 +- > block/raw-aio.h | 2 +- > block/raw-posix.c | 16 +-- > block/sheepdog.c | 19 +++ > block/throttle-groups.c | 19 +++ > blockjob.c | 16 ++- > docs/multiple-iothreads.txt | 40 +++--- > include/block/aio.h | 13 +- > include/block/block.h | 3 +- > include/block/block_int.h | 22 +++- > include/block/blockjob.h | 7 + > include/block/throttle-groups.h | 1 + > include/qemu/rfifolock.h | 54 -------- > include/qemu/thread-posix.h | 6 + > include/qemu/thread-win32.h | 10 ++ > include/qemu/thread.h | 3 + > iothread.c | 20 +-- > stubs/Makefile.objs | 1 + > stubs/iothread.c | 8 ++ > tests/.gitignore | 1 - > tests/Makefile | 2 - > tests/qemu-iotests/060 | 8 +- > tests/qemu-iotests/060.out | 4 +- > tests/test-aio.c | 22 ++-- > tests/test-rfifolock.c | 91 ------------- > util/Makefile.objs | 1 - > util/qemu-thread-posix.c | 13 ++ > util/qemu-thread-win32.c | 25 ++++ > util/rfifolock.c | 78 ----------- > 37 files changed, 471 insertions(+), 474 deletions(-) > delete mode 100644 include/qemu/rfifolock.h > create mode 100644 stubs/iothread.c > delete mode 100644 tests/test-rfifolock.c > delete mode 100644 util/rfifolock.c
Looks good overall. I'm a little nervous about merging it for QEMU 2.6 but the block job, NBD, and data plane tests should give it a good workout. I have posted comments on a few patches. Stefan
signature.asc
Description: PGP signature