Re: [PATCH 3/6] mirror: Skip writing zeroes when target is already zero

Eric Blake Wed, 16 Apr 2025 14:52:47 -0700

On Tue, Apr 15, 2025 at 06:59:15PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 11.04.25 04:04, Eric Blake wrote:
> > When mirroring, the goal is to ensure that the destination reads the
> > same as the source; this goal is met whether the destination is sparse
> > or fully-allocated.  However, if the destination cannot efficiently
> > write zeroes, then any time the mirror operation wants to copy zeroes
> > from the source to the destination (either during the background over
> > sparse regions when doing a full mirror, or in the foreground when the
> > guest actively writes zeroes), we were causing the destination to
> > fully allocate that portion of the disk, even if it already read as
> > zeroes.
> > 
> > -    ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
> > -                               op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
> > +    if (op->s->zero_bitmap) {
> > +        unsigned long last = (op->offset + op->bytes) / op->s->granularity;
> 
> Maybe, call it "end", not "last, as it's not last element of the range, but 
> first after the range.


Works for me.

> 
> Also, seems we need still do DIV_ROUND_UP, for ..
> 
> > +        assert(QEMU_IS_ALIGNED(op->offset, op->s->granularity));
> > +        assert(QEMU_IS_ALIGNED(op->bytes, op->s->granularity) ||
> > +               op->offset + op->bytes == op->s->bdev_length);
> 
> .. ^ this case, when bytes is unaligned to granularity but aligned to 
> bdev_length.

Hmm, good point.  I'll have to revisit the logic and make sure it
still works in the case of a final partial region.

> 
> > +        if (find_next_zero_bit(op->s->zero_bitmap, last,
> > +                               op->offset / op->s->granularity) == last) {
> > +            write_needed = false;
> > +        }
> > +    }
> > +    if (write_needed) {
> > +        ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
> > +                                   op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
> > +    }
> > +    if (ret >= 0 && op->s->zero_bitmap) {
> > +        bitmap_set(op->s->zero_bitmap, op->offset / op->s->granularity,
> > +                   op->bytes / op->s->granularity);
> 
> and here we want to align up bytes, for the corner case
> 
> > +    }
> 
> Also, I'm not sure, what guarantees we have in case of write-zeroes failure. 
> Should we clear the bitmap in this case, like we do MIRROR_METHOD_COPY and 
> MIRROR_METHOD_DISCARD below

My goal was to clear the bitmap before any action that (potentially)
destroys zero regions (writes and discards), even if that action
fails; but only set the bitmap after success on any action that
creates zero regions (write-zero).  If the write-zero fails, and the
bitmap was already set, it doesn't matter that we left the bitmap set
(a write zeroes that fails AND causes the disk to no longer read as
zero should not happen); otherwise, we are correct in leaving the
bitmap unset on failure (even if the write-zeros partially succeeded,
the pessimistic answer of assuming a region is non-zero is always
correct even if sometimes slower).

> 
> >       mirror_write_complete(op, ret);
> >   }
> > 
> > @@ -441,6 +459,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, 
> > int64_t offset,
> >       Coroutine *co;
> >       int64_t bytes_handled = -1;
> > 
> > +    assert(QEMU_IS_ALIGNED(offset, s->granularity));
> > +    assert(QEMU_IS_ALIGNED(bytes, s->granularity) ||
> > +           offset + bytes == s->bdev_length);
> >       op = g_new(MirrorOp, 1);
> >       *op = (MirrorOp){
> >           .s              = s,
> > @@ -452,12 +473,21 @@ static unsigned mirror_perform(MirrorBlockJob *s, 
> > int64_t offset,
> > 
> >       switch (mirror_method) {
> >       case MIRROR_METHOD_COPY:
> > +        if (s->zero_bitmap) {
> > +            bitmap_clear(s->zero_bitmap, offset / s->granularity,
> > +                         bytes / s->granularity);
> 
> again, align up for corner case
> 
> > +        }
> >           co = qemu_coroutine_create(mirror_co_read, op);
> >           break;
> >       case MIRROR_METHOD_ZERO:
> > +        /* s->zero_bitmap handled in mirror_co_zero */
> >           co = qemu_coroutine_create(mirror_co_zero, op);
> >           break;
> >       case MIRROR_METHOD_DISCARD:
> > +        if (s->zero_bitmap) {
> > +            bitmap_clear(s->zero_bitmap, offset / s->granularity,
> > +                         bytes / s->granularity);
> > +        }
> >           co = qemu_coroutine_create(mirror_co_discard, op);
> >           break;
> >       default:
> > @@ -851,10 +881,17 @@ static int coroutine_fn GRAPH_UNLOCKED 
> > mirror_dirty_init(MirrorBlockJob *s)
> >       }
> >       bdrv_graph_co_rdunlock();
> > 
> > -    if (s->zero_target && ret <= 0) {
> > +    if (s->zero_target) {
> > +        int64_t length;
> > +
> >           if (ret < 0) {
> >               return ret;
> >           }
> > +        length = DIV_ROUND_UP(s->bdev_length, s->granularity);
> > +        s->zero_bitmap = bitmap_new(length);
> > +        if (ret > 0) {
> > +            bitmap_set(s->zero_bitmap, 0, length);
> 
> hmm, we should not continue zeroing target in case of ret > 0.

Correct. If ret > 0, we set zero_bitmap to record that starting state,
and then want to proceed on with the rest of the code that populates
only the non-zero portions of the dirty bitmap.

> 
> I didn't like that we set ret in one if-block, and handle in another, but now 
> it gets even more confusing.
> 
> Maybe, just move bdrv_co_is_zero_fast() call into big "if (s->zero_target) {" 
> ?

That would require grabbing another RDLOCK - but I'll try and see if
that makes things easier to read.

...
> > +        if (job->zero_bitmap && ret >= 0) {
> > +            bitmap_set(job->zero_bitmap, dirty_bitmap_offset / 
> > job->granularity,
> > +                       (dirty_bitmap_end - dirty_bitmap_offset) /
> > +                       job->granularity);
> > +        }
> 
> Same thing, probably we should clear the bitmap in case of write failure.

Why? Either the bitmap was already clear, or we had a write failure on
writing zeroes on top of zeroes but the region should still read as
zeroes.

> 
> >           break;
> > 
> >       case MIRROR_METHOD_DISCARD:
> > +        if (job->zero_bitmap) {
> > +            bitmap_clear(job->zero_bitmap, zero_bitmap_offset,
> > +                         zero_bitmap_end - zero_bitmap_offset);
> > +        }
> >           assert(!qiov);
> >           ret = blk_co_pdiscard(job->target, offset, bytes);
> >           break;
> > @@ -1436,10 +1498,10 @@ do_sync_target_write(MirrorBlockJob *job, 
> > MirrorMethod method,
> >            * at function start, and they must be still dirty, as we've 
> > locked
> >            * the region for in-flight op.
> >            */
> > -        bitmap_offset = QEMU_ALIGN_DOWN(offset, job->granularity);
> > -        bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
> > -        bdrv_set_dirty_bitmap(job->dirty_bitmap, bitmap_offset,
> > -                              bitmap_end - bitmap_offset);
> > +        dirty_bitmap_offset = QEMU_ALIGN_DOWN(offset, job->granularity);
> > +        dirty_bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
> > +        bdrv_set_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
> > +                              dirty_bitmap_end - dirty_bitmap_offset);
> 
> Not really matter, but still, renaming in a separate patch would make this 
> one a bit simpler.

Okay, I'll split the variable rename.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Re: [PATCH 3/6] mirror: Skip writing zeroes when target is already zero

Reply via email to