06.05.2020 16:57, Eric Blake wrote:
On 5/6/20 4:25 AM, Vladimir Sementsov-Ogievskiy wrote:
In case of !VDI_IS_ALLOCATED[], we do zero out the corresponding chunk
of qiov. So, this should be reported as ZERO.
This part makes sense outright, since vdi does not allow backing files, so all
reads of a VDI disk result in data rather than deferral to another layer, and
we indeed read zero in this case. However, it _is_ a behavior change:
previously, 'qemu-io -c map' on a vdi image will show unallocated portions, but
with this change, the entire image now shows as allocated. You need to call
out this fact in the commit message, documenting that it is intentional (it
matches what we do for posix files: since neither files nor vdi support backing
images, we guarantee that all read attempts will be satisfied by this layer
rather than deferring to a backing layer, and thus can treat all data as
allocated in this layer, regardless of whether it is sparse).
Do any existing iotests need an update to reflect change in output? If not,
should we have an iotest covering it?
all passed for me on tmpfs (with some skips)..
After block-status update, it never reports 0, so setting
unallocated_blocks_are_zero doesn't make sense. Drop it.
This is a harder claim. To prove it, we need to inspect all callers that use
unallocated_blocks_are_zero, to see their intent. Fortunately, the list is
small - a mere 2 clients!
block/io.c:bdrv_co_block_status(): if .bdrv_co_block_status sets either _DATA
or _ZERO, block status adds _ALLOCATED; but if the driver did not set either,
we use bdrv_unallocated_blocks_are_zero() in order to set _ZERO but not
_ALLOCATED. This is the code that explains the behavior change mentioned above
(now that vdi is reporting _ZERO instead of 0, block_status is appending
_ALLOCATED instead of querying bdrv_unallocated_blocks_are_zero). But you are
correct that it does not change where _ZERO is reported.
qemu-img.c:img_convert(): calls bdrv_get_info() up front and caches what it
learned from unallocated_blocks_are_zero about the target; later on, in
convert_iteration_sectors(), it uses this information to optimize the case
where the source reads as zero, but the target has a backing file and the range
being written lies beyond the end of the backing file. That is, given the
following scenario:
qemu-img convert input -B backing output
input: ABCD-0E
backing: ACBD
the optimization lets us produce:
output: -BC---E
instead of the larger:
output: -BC--0E
Do we have any iotests that cover this particular scenario, to prove that our
optimization does indeed result in a smaller image file without explicit zeros
written in the tail? Or put another way, if we were to disable the
post_backing_zero optimization in convert_iteration_sectors(), would any
iotests fail? If not, we should really enhance our testsuite.
With that said, we could just as easily achieve the optimization of the
conversion to the tail of a destination with short backing file by checking
block_status to see whether the tail already reads as all zeroes, rather than
relying on unallocated_blocks_are_zero. Even if querying block_status is a
slight pessimization in time, it would avoid regressing in the more important
factor of artificially bloating the destination. I would _really_ love to see
a patch to qemu-img.c reworking the logic to avoid the use of
unallocated_blocks_are_zero first, prior to removing the setting of that field
in the various drivers. Once bdrv_co_block_status() remains as the only client
of the field, then I could accept this patch with a better commit message about
the intentional change in block_status _ALLOCATION behavior.
Actually unallocated_blocks_are_zero doesn't influence on this thing, you should not
check unallocated_blocks_are_zero to understand how to read unallocated area above short
backing (after backing EOF). It's always reads as zeroes. So, patch 7 just use
"true" instead. But yes, I'd better move patch 7 to be the first one.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@virtuozzo.com>
---
block/vdi.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/block/vdi.c b/block/vdi.c
index 0c7835ae70..83471528d2 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -334,7 +334,6 @@ static int vdi_get_info(BlockDriverState *bs,
BlockDriverInfo *bdi)
logout("\n");
bdi->cluster_size = s->block_size;
bdi->vm_state_offset = 0;
- bdi->unallocated_blocks_are_zero = true;
return 0;
}
@@ -536,7 +535,7 @@ static int coroutine_fn
vdi_co_block_status(BlockDriverState *bs,
*pnum = MIN(s->block_size - index_in_block, bytes);
result = VDI_IS_ALLOCATED(bmap_entry);
if (!result) {
- return 0;
+ return BDRV_BLOCK_ZERO;
}
*map = s->header.offset_data + (uint64_t)bmap_entry * s->block_size +
--
Best regards,
Vladimir