Re: [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

Kevin Wolf Mon, 28 Sep 2015 14:40:11 -0700

Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben:
> During mirror, if the target device does not have support zero
> initialization, a mirror may result in a corrupt image.


I think you want to check this sentence. ("During mirror [...], a
mirror may result [...]")

> For instance, on mirror to a host device with format = raw, whatever
> random data is on the target device will still be there for unallocated
> sectors.
> 
> This is because during the mirror, we set the dirty bitmap to copy only
> sectors allocated above 'base'.  In the case of target devices where we
> cannot assume unallocated sectors will be read as zeroes, we need to
> explicitely zero out this data.
> 
> In order to avoid zeroing out all sectors of the target device prior to
> mirroring, we do zeroing as part of the block job.  A second dirty
> bitmap cache is created, to track sectors that are unallocated above
> 'base'.  These sectors are then checked for status of BDRV_BLOCK_ZERO
> on the target - if they are not, then zeroes are explicitly written.

Why do you need a bitmap? You never change the bitmap after initialising
it, so couldn't you instead just check the allocation status when you
need it?

In fact, why do we need two passes? I would have expected that commit
dcfb3beb already does the trick, with checking allocation status and
writing zeroes during the normal single pass.

If that commit fails to solve the problem, I guess I first need to
understand why before I can continue reviewing this one...

> This only occurs under two conditions:
> 
>     1. 'mode' != "existing"
>     2. bdrv_has_zero_init(target) == NULL
> 
> We perform the mirroring through mirror_iteration() as before, except
> in two passes.  If the above two conditions are met, the first pass
> is using the bitmap tracking unallocated sectors, to write the needed
> zeroes.  Then, the second pass is performed, to mirror the actual data
> as before.
> 
> If the above two conditions are not met, then the first pass is skipped,
> and only the second pass (the one with the actual data) is performed.
> 
> Signed-off-by: Jeff Cody <jc...@redhat.com>
> ---
>  block/mirror.c            | 109 
> ++++++++++++++++++++++++++++++++++------------
>  blockdev.c                |   2 +-
>  include/block/block_int.h |   3 +-
>  qapi/block-core.json      |   6 ++-
>  4 files changed, 87 insertions(+), 33 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 405e5c4..b599176 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -50,7 +50,9 @@ typedef struct MirrorBlockJob {
>      int64_t bdev_length;
>      unsigned long *cow_bitmap;
>      BdrvDirtyBitmap *dirty_bitmap;
> -    HBitmapIter hbi;
> +    HBitmapIter zero_hbi;
> +    HBitmapIter allocated_hbi;
> +    HBitmapIter *hbi;
>      uint8_t *buf;
>      QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
>      int buf_free_count;
> @@ -60,6 +62,8 @@ typedef struct MirrorBlockJob {
>      int sectors_in_flight;
>      int ret;
>      bool unmap;
> +    bool zero_unallocated;
> +    bool zero_cycle;
>      bool waiting_for_io;
>  } MirrorBlockJob;
>  
> @@ -166,10 +170,10 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>      int pnum;
>      int64_t ret;
>  
> -    s->sector_num = hbitmap_iter_next(&s->hbi);
> +    s->sector_num = hbitmap_iter_next(s->hbi);
>      if (s->sector_num < 0) {
> -        bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
> -        s->sector_num = hbitmap_iter_next(&s->hbi);
> +        bdrv_dirty_iter_init(s->dirty_bitmap, s->hbi);
> +        s->sector_num = hbitmap_iter_next(s->hbi);
>          trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
>          assert(s->sector_num >= 0);
>      }
> @@ -287,7 +291,7 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>           */
>          if (next_sector > hbitmap_next_sector
>              && bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
> -            hbitmap_next_sector = hbitmap_iter_next(&s->hbi);
> +            hbitmap_next_sector = hbitmap_iter_next(s->hbi);
>          }
>  
>          next_sector += sectors_per_chunk;
> @@ -300,25 +304,34 @@ static uint64_t coroutine_fn 
> mirror_iteration(MirrorBlockJob *s)
>      s->sectors_in_flight += nb_sectors;
>      trace_mirror_one_iteration(s, sector_num, nb_sectors);
>  
> -    ret = bdrv_get_block_status_above(source, NULL, sector_num,
> -                                      nb_sectors, &pnum);
> -    if (ret < 0 || pnum < nb_sectors ||
> -            (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
> -        bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
> -                       mirror_read_complete, op);
> -    } else if (ret & BDRV_BLOCK_ZERO) {
> -        bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> -                              s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> -                              mirror_write_complete, op);
> +    if (s->zero_cycle) {
> +        ret = bdrv_get_block_status(s->target, sector_num, nb_sectors, 
> &pnum);
> +        if (!(ret & BDRV_BLOCK_ZERO)) {
> +            bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> +                                  s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> +                                  mirror_write_complete, op);
> +        }

It seems to be expected that this function always involves an AIO
request and the completion event is what helps making progress. For the
BDRV_BLOCK_ZERO case, we don't do that however. I'm not sure what
exactly this means, but at least I think we are applying block job
throttling to doing nothing with some areas of the image.

>      } else {
> -        assert(!(ret & BDRV_BLOCK_DATA));
> -        bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
> -                         mirror_write_complete, op);
> +        ret = bdrv_get_block_status_above(source, NULL, sector_num,
> +                                          nb_sectors, &pnum);
> +        if (ret < 0 || pnum < nb_sectors ||
> +                (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
> +            bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
> +                           mirror_read_complete, op);
> +        } else if (ret & BDRV_BLOCK_ZERO) {
> +            bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> +                                  s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> +                                  mirror_write_complete, op);
> +        } else {
> +            assert(!(ret & BDRV_BLOCK_DATA));
> +            bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
> +                             mirror_write_complete, op);
> +        }
>      }
>      return delay_ns;
>  }

Kevin

Re: [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present

Reply via email to