Hi all, this is the first non-RFC submission of my block job patches for 1.2. Everything is there, including multiple in-flight operations in the mirroring job and new testcases (for all of streaming, mirroring, hierarchical bitmap). The tests use blkdebug to test error reporting for both streaming and mirroring.
This still does not include a persistent dirty bitmap, which will be work for 1.3. If you want to tinker with this, everything is available at git://github.com/bonzini/qemu.git in branch blkmirror-job-1.2. I know it's a lot of code, I'm sorry for dropping this quite close to the feature freeze. Unfortunately, preparing for the Linux merge window and other non-QEMU tasks have dragged this 1-2 weeks more than I would have liked. The patches are organized as follows: 01-12 preparatory work for block job errors, including support for pausing and resuming jobs 13-17 introduce block job errors, and add support in block-stream 18-26 preparatory work for block mirroring, including creating new new functions out of existing code. 27-34 introduce a simple version of mirroring. The initial patch add the mirroring logic, followed by the ability to switch to the destination of migration, to query the target file (for example, polling the high-water mark), and to handle errors during the job. All these changes come with testcases. 35-43 These patches introduce the first optimizations, namely supporting an arbitrary granularity for the dirty bitmap. The current default, 1M, is too coarse to let the job converge quickly and in almost real-time. These patches reimplement the block device dirty bitmap to allow efficient iteration, and add cluster copy-on-write logic. Cluster copy-on-write is needed because management will want to start the copy before the backing file is in place in the destination; if mirroring takes care of copy-on-write, BDRV_O_NO_BACKING can be used even if the granularity is smaller than the cluster size. 44-47 A second round optimizations, replacing serialized read-write operations with multiple asynchronous I/O operations. The various in-flight operations can be of arbitrary size. The initial copy will end up reading large chunks sequentially (10M by default), while subsequent passes can mimic more closely the guest's I/O patterns. Compared to v1, the last four patches are entirely new, and so are many of the testcase changes. All comments from Eric's review are addressed. In some cases the patches were modified (reversing if conditions or things like that) in order to keep later patches simpler. I also added several new tracepoints. Latency is vital to any migration scheme using a dirty bitmap, especially because completion is entirely asynchronous, so I expect this to be used either with pretty good storage, or on guests doing relatively little I/O. I tested this both on my laptop and with moderately high-end SAS disks. On the SAS disks, time between checkpoints (trace_mirror_before_flush) on kernel compilation (-j3 to -j12, 4 or 8 vCPUs) is almost always within 1 second, usually much less targeting a local disk. On hibernation, which is a worst-case test (sequential I/O happening with no flushes in between) and failed completely to converge on my lowly laptop hard disk, a checkpoint was reached every 0.5 to 3 seconds. When targeting a local qemu-nbd server performance was similar. Kernel compilation showed occasional bumps, but they were fixed in 1.5-7 seconds. Please review! Paolo Bonzini (47): qapi: generalize documentation of streaming commands qerror/block: introduce QERR_BLOCK_JOB_NOT_ACTIVE block: move job APIs to separate files block: add block_job_query block: add support for job pause/resume qmp: add block-job-pause and block-job-resume qemu-iotests: add test for pausing a streaming operation block: rename block_job_complete to block_job_completed block: rename BlockErrorAction, BlockQMPEventAction block: move BlockdevOnError declaration to QAPI block: reorganize io error code block: sort BlockDeviceIoStatus errors by severity block: introduce block job error stream: add on-error argument blkdebug: process all set_state rules in the old state qemu-iotests: map underscore to dash in QMP argument names qemu-iotests: add tests for streaming error handling block: live snapshot documentation tweaks block: add bdrv_query_info block: add bdrv_query_stats block: add bdrv_ensure_backing_file block: make device optional in BlockInfo block: add target info to QMP query-blockjobs command block: introduce new dirty bitmap functionality block: add block-job-complete block: introduce BLOCK_JOB_READY event block: introduce mirror job qmp: add drive-mirror command mirror: support querying target file mirror: implement completion qemu-iotests: add mirroring test case block: forward bdrv_iostatus_reset to block job mirror: add support for on-source-error/on-target-error qmp: add pull_event function qemu-iotests: add testcases for mirroring on-source-error/on-target-error host-utils: add ffsl and flsl add hierarchical bitmap data type and test cases block: implement dirty bitmap using HBitmap block: make round_to_clusters public mirror: perform COW if the cluster size is bigger than the granularity block: return count of dirty sectors, not chunks block: allow customizing the granularity of the dirty bitmap mirror: allow customizing the granularity mirror: switch mirror_iteration to AIO mirror: add buf-size argument to drive-mirror mirror: support more than one in-flight AIO operation mirror: support arbitrarily-sized iterations Makefile.objs | 5 +- QMP/qmp-events.txt | 43 +++ QMP/qmp.py | 20 ++ block-migration.c | 8 +- block.c | 486 ++++++++++++------------------ block.h | 37 ++- block/Makefile.objs | 3 +- block/blkdebug.c | 14 +- block/mirror.c | 562 +++++++++++++++++++++++++++++++++++ block/stream.c | 33 +- block_int.h | 192 +++--------- blockdev.c | 257 +++++++++++++--- blockjob.c | 290 ++++++++++++++++++ blockjob.h | 285 ++++++++++++++++++ hbitmap.c | 394 ++++++++++++++++++++++++ hbitmap.h | 51 ++++ hmp-commands.hx | 73 ++++- hmp.c | 65 +++- hmp.h | 4 + host-utils.h | 45 +++ hw/fdc.c | 4 +- hw/ide/core.c | 20 +- hw/scsi-disk.c | 23 +- hw/scsi-generic.c | 4 +- hw/virtio-blk.c | 19 +- monitor.c | 2 + monitor.h | 2 + qapi-schema.json | 238 +++++++++++++-- qemu-tool.c | 6 + qerror.c | 12 + qerror.h | 9 + qmp-commands.hx | 72 ++++- tests/Makefile | 2 + tests/qemu-iotests/030 | 178 ++++++++++- tests/qemu-iotests/039 | 661 +++++++++++++++++++++++++++++++++++++++++ tests/qemu-iotests/group | 3 +- tests/qemu-iotests/iotests.py | 19 +- tests/test-hbitmap.c | 384 ++++++++++++++++++++++++ trace-events | 24 +- 39 files changed, 3946 insertions(+), 603 deletions(-) create mode 100644 block/mirror.c create mode 100644 blockjob.c create mode 100644 blockjob.h create mode 100644 hbitmap.c create mode 100644 hbitmap.h create mode 100755 tests/qemu-iotests/039 create mode 100644 tests/test-hbitmap.c -- 1.7.10.4