On Mon, Jan 09, 2017 at 11:04:27AM +0000, Stefan Hajnoczi wrote: > On Fri, Dec 23, 2016 at 05:28:43PM +0300, Vladimir Sementsov-Ogievskiy wrote: > > Jeff or John: are you reviewing this?
It's in my review queue, but it would probably be a good one for John to review as well if he has time. > > > This is a new architecture for backup. It solves some current problems: > > 1. intersecting requests: for now at request start we wait for all > > intersecting requests, which means that > > a. we may wait even for unrelated to our request clusters > > b. not full async: if we are going to copy clusters 1,2,3,4, when 2 and > > 4 are in flight, why should we wait for 2 and 4 to be fully copied? Why not > > to start 1 and 3 in parallel with 2 and 4? > > > > 2. notifier request is internally synchronous: if notifier starts copying > > clusters 1,2,3,4, they will be copied one by one in synchronous loop. > > > > 3. notifier wait full copying of corresponding clusters (when actually it > > may wait only for _read_ operations to be finished) > > > > In short, what is done: > > 1. full async scheme > > 4. no intersecting requests > > 3. notifiers can wait only for read, not for write > > 4. notifiers wait only for corresponding clusters > > 5. time limit for notifiers > > 5. skip unallocated clusters for full mode > > 6. use HBitmap as main backup bitmap and just init it from dirty bitmap for > > incremental case > > 7. retrying: do not reread on write fail > > > > # Intro > > > > Instead of sync-copying + async-notifiers as in old backup, or aio requests > > like in mirror, this scheme just start 24 workers - separate coroutines, > > each of them copying clusters synchronously. Copying is only done by one > > cluster, there are no large requests. > > The only difference for clusters, awaited by write notifiers, is larger > > priority. So, notifiers do not start io requests, they just mark some > > clusters as awaited and yield. Then, when some worker completes read of > > last cluster, awaited by this notifier it will enter it. > > > > # Some data structures > > > > Instead of done_bitmap - copy_bitmap, like in mirror. > > HBitmap copy_bitmap > > Exactly, what should be copied: > > 0 - may mean one of three things: > > - this cluster should not be copied at all > > - this cluster is in flight > > - this cluster is already copied > > 1 - means that cluster should be copied, but not touched yet (no async > > io exists for it) > > > > New bitmap: notif_wait_bitmap - not HBitmap, just Bitmap. > > Exactly, in flight clusters, waiting for read operation: > > 0 - may mean one of three things: > > - this cluster should not be copied at all > > - this cluster is in flight and it is _already_ read to memory > > - this cluster is already copied > > 1 - means that cluster is in flight, but read operation have not > > finished > > yet > > The only exception is none-mode: in this case 1 means in flight: in io > > read or write. This is needed for image fleecing. > > > > Cluster states (copy_bitmap, notif_wait_bitmap) > > > > 0, 0 - Ignored (should not be copied at all) or In flight (read done) or > > Copied > > 0, 1 - In flight, read operation not finished (or write op. - for none-mode) > > 1, 0 - Should be copied, but not touched yet > > 1, 1 - Impossible state > > > > NotifierRequest - request from notifier, it changes sequence of cluster > > copying by workers. > > NotifierRequest { > > int64_t start; > > int64_t end; > > int nb_wait; // nb clusters (in specified range) that should be copied > > but not already read, i.e. clusters awaited by this notifier > > Coroutine *notif; // corresponding notifier coroutine > > } > > > > notifier_reqs - list of notifier requests > > > > # More info > > > > At backup start copy_bitmap is inited to sync_bitmap for incremental > > backup. For top/full backup it is inited to all ones, but in parallel with > > workers main coroutine skips not allocated clusters. > > > > Worker coroutines are copying clusters, preferable awaited by notifiers > > (for which NotifierRequest exists in the list). Function get_work helps > > them. > > Workers will copy clusters, awaited by notifiers even if block-job is > > paused - it is the same behaviour as in old architecture. > > > > Old backup fails guest-write if notifier fails to backup corresponding > > clusters. In the new scheme there is a little difference: notifier just > > wait for 5s and if backup can't copy all corresponding clusters in this > > time - guest-write fails. > > Error scenarios was considered on list, the final solution was to provide > > user a possibility to chose what should be failed: backup or guest-write. > > I'll add this later. > > > > Worker can exit (no more clusters to copy or fatal error) or pause (error > > or user pause or throttling). When last worker goes to pause it rings up > > main block-job coroutine, which will handle user pause or errors. We need > > to handle errors in main coroutine because of nature of > > block_job_error_action, which may yield. > > > > There also is a bonus: new io-retrying scheme: if there is an error on read > > or write, worker just yield in the retrying loop and if it will be resumed > > (with job->error_exit = false) it will continue from the same place, so if > > we have failed write after successful read we will not reread. > > > > Vladimir Sementsov-Ogievskiy (21): > > backup: move from done_bitmap to copy_bitmap > > backup: init copy_bitmap from sync_bitmap for incremental > > backup: improve non-dirty bits progress processing > > backup: use copy_bitmap in incremental backup > > hbitmap: improve dirty iter > > backup: rewrite top mode cluster skipping > > backup: refactor: merge top/full/incremental backup code > > backup: skip unallocated clusters for full mode > > backup: separate copy function > > backup: refactor backup_copy_cluster() > > backup: move r/w error handling code to r/w functions > > iotests: add supported_cache_modes to main function > > coroutine: add qemu_coroutine_add_next > > block: add trace point on bdrv_close_all > > bitmap: add bitmap_count_between() function > > hbitmap: add hbitmap_count_between() function > > backup: make all reads not serializing > > backup: new async architecture > > backup: refactor backup_do_cow > > backup: move bitmap handling from backup_do_cow to get_work > > backup: refactor: remove backup_do_cow() > > > > block.c | 1 + > > block/backup.c | 871 > > +++++++++++++++++++++++++++++++----------- > > block/trace-events | 34 +- > > blockjob.c | 29 +- > > include/block/blockjob.h | 15 +- > > include/qemu/bitmap.h | 4 + > > include/qemu/coroutine.h | 2 + > > include/qemu/hbitmap.h | 26 +- > > tests/qemu-iotests/055 | 4 +- > > tests/qemu-iotests/129 | 6 +- > > tests/qemu-iotests/iotests.py | 7 +- > > util/bitmap.c | 27 ++ > > util/hbitmap.c | 32 +- > > util/qemu-coroutine.c | 7 + > > 14 files changed, 805 insertions(+), 260 deletions(-) > > > > -- > > 1.8.3.1 > >