Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben: > During mirror, if the target device does not have support zero > initialization, a mirror may result in a corrupt image.
I think you want to check this sentence. ("During mirror [...], a mirror may result [...]") > For instance, on mirror to a host device with format = raw, whatever > random data is on the target device will still be there for unallocated > sectors. > > This is because during the mirror, we set the dirty bitmap to copy only > sectors allocated above 'base'. In the case of target devices where we > cannot assume unallocated sectors will be read as zeroes, we need to > explicitely zero out this data. > > In order to avoid zeroing out all sectors of the target device prior to > mirroring, we do zeroing as part of the block job. A second dirty > bitmap cache is created, to track sectors that are unallocated above > 'base'. These sectors are then checked for status of BDRV_BLOCK_ZERO > on the target - if they are not, then zeroes are explicitly written. Why do you need a bitmap? You never change the bitmap after initialising it, so couldn't you instead just check the allocation status when you need it? In fact, why do we need two passes? I would have expected that commit dcfb3beb already does the trick, with checking allocation status and writing zeroes during the normal single pass. If that commit fails to solve the problem, I guess I first need to understand why before I can continue reviewing this one... > This only occurs under two conditions: > > 1. 'mode' != "existing" > 2. bdrv_has_zero_init(target) == NULL > > We perform the mirroring through mirror_iteration() as before, except > in two passes. If the above two conditions are met, the first pass > is using the bitmap tracking unallocated sectors, to write the needed > zeroes. Then, the second pass is performed, to mirror the actual data > as before. > > If the above two conditions are not met, then the first pass is skipped, > and only the second pass (the one with the actual data) is performed. > > Signed-off-by: Jeff Cody <jc...@redhat.com> > --- > block/mirror.c | 109 > ++++++++++++++++++++++++++++++++++------------ > blockdev.c | 2 +- > include/block/block_int.h | 3 +- > qapi/block-core.json | 6 ++- > 4 files changed, 87 insertions(+), 33 deletions(-) > > diff --git a/block/mirror.c b/block/mirror.c > index 405e5c4..b599176 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -50,7 +50,9 @@ typedef struct MirrorBlockJob { > int64_t bdev_length; > unsigned long *cow_bitmap; > BdrvDirtyBitmap *dirty_bitmap; > - HBitmapIter hbi; > + HBitmapIter zero_hbi; > + HBitmapIter allocated_hbi; > + HBitmapIter *hbi; > uint8_t *buf; > QSIMPLEQ_HEAD(, MirrorBuffer) buf_free; > int buf_free_count; > @@ -60,6 +62,8 @@ typedef struct MirrorBlockJob { > int sectors_in_flight; > int ret; > bool unmap; > + bool zero_unallocated; > + bool zero_cycle; > bool waiting_for_io; > } MirrorBlockJob; > > @@ -166,10 +170,10 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > int pnum; > int64_t ret; > > - s->sector_num = hbitmap_iter_next(&s->hbi); > + s->sector_num = hbitmap_iter_next(s->hbi); > if (s->sector_num < 0) { > - bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi); > - s->sector_num = hbitmap_iter_next(&s->hbi); > + bdrv_dirty_iter_init(s->dirty_bitmap, s->hbi); > + s->sector_num = hbitmap_iter_next(s->hbi); > trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap)); > assert(s->sector_num >= 0); > } > @@ -287,7 +291,7 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > */ > if (next_sector > hbitmap_next_sector > && bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) { > - hbitmap_next_sector = hbitmap_iter_next(&s->hbi); > + hbitmap_next_sector = hbitmap_iter_next(s->hbi); > } > > next_sector += sectors_per_chunk; > @@ -300,25 +304,34 @@ static uint64_t coroutine_fn > mirror_iteration(MirrorBlockJob *s) > s->sectors_in_flight += nb_sectors; > trace_mirror_one_iteration(s, sector_num, nb_sectors); > > - ret = bdrv_get_block_status_above(source, NULL, sector_num, > - nb_sectors, &pnum); > - if (ret < 0 || pnum < nb_sectors || > - (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) { > - bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors, > - mirror_read_complete, op); > - } else if (ret & BDRV_BLOCK_ZERO) { > - bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors, > - s->unmap ? BDRV_REQ_MAY_UNMAP : 0, > - mirror_write_complete, op); > + if (s->zero_cycle) { > + ret = bdrv_get_block_status(s->target, sector_num, nb_sectors, > &pnum); > + if (!(ret & BDRV_BLOCK_ZERO)) { > + bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors, > + s->unmap ? BDRV_REQ_MAY_UNMAP : 0, > + mirror_write_complete, op); > + } It seems to be expected that this function always involves an AIO request and the completion event is what helps making progress. For the BDRV_BLOCK_ZERO case, we don't do that however. I'm not sure what exactly this means, but at least I think we are applying block job throttling to doing nothing with some areas of the image. > } else { > - assert(!(ret & BDRV_BLOCK_DATA)); > - bdrv_aio_discard(s->target, sector_num, op->nb_sectors, > - mirror_write_complete, op); > + ret = bdrv_get_block_status_above(source, NULL, sector_num, > + nb_sectors, &pnum); > + if (ret < 0 || pnum < nb_sectors || > + (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) { > + bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors, > + mirror_read_complete, op); > + } else if (ret & BDRV_BLOCK_ZERO) { > + bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors, > + s->unmap ? BDRV_REQ_MAY_UNMAP : 0, > + mirror_write_complete, op); > + } else { > + assert(!(ret & BDRV_BLOCK_DATA)); > + bdrv_aio_discard(s->target, sector_num, op->nb_sectors, > + mirror_write_complete, op); > + } > } > return delay_ns; > } Kevin