Re: [PATCH v2 1/2] qcow2: handle discard-no-unref in measure

Jean-Louis Dupond Mon, 17 Feb 2025 08:53:14 -0800

Hi,

First of all sorry for the huge delay, but didn't had time to follow-upon this lately.And it got some lower priority, as we don't hit it often and have afairly easy workaround (fill the empty blocks again in the snapshot bywriting data to the disk).


On 7/10/24 14:58, Hanna Czenczek wrote:

On 05.06.24 15:25, Jean-Louis Dupond wrote:
When doing a measure on an image with a backing file and
discard-no-unref is enabled, the code should take this into account.
That doesn’t make sense to me. As far as I understand, 'measure' issupposed to report how much space you need for a given image, i.e. ifyou were to convert it to a new image. discard-no-unref doesn’tfactor into that, because for a 'convert' target (a new image),nothing can be discarded.
Reading the issue, I understand that oVirt uses measure to determinethe size of the target of a 'commit' operation. Seems a bit likeabuse to me, precisely because of the issue you’re facing. Morespecifically, a 'commit' operation is a complex thing with a lot ofvariables, so the outcome depends on a lot.

Correct. oVirt uses the measure command to find out how big thedestination volume needs to be when running a commit/merge of 2 disks.This way it can resize the container (Logical Volume here) to thecorrect size in order to succeed the commit.

For example, this patch just checks the discard-no-unref setting onthe top image. But AFAIU it doesn’t matter what the setting on thetop image is, it matters what the setting on the commit target is.'measure' can’t know this because it doesn’t know what the committarget is. As far as I can see, this patch actually assumes thecommit target is the first backing image (it specifically checks inthe image whether a block is allocated) – why?

By default it would check the top image indeed, but not when using thecomplex json parameters to qemu-img measure.

For example:
./build/qemu-img create -f qcow2 /tmp/test.qcow2 128M

./build/qemu-io -c 'open /tmp/test.qcow2' -c 'write 0 8M' -c 'write 56M20M' -c 'write 10M 8M' -c 'write 24M 32M'./build/qemu-img create -f qcow2 -b /tmp/test.qcow2 -F qcow2/tmp/test_snap.qcow2./build/qemu-io -c 'open -o discard=unmap,discard-no-unref=on/tmp/test_snap.qcow2' -c 'write 16M 8M' -c 'discard 60M 20M' -c 'write84M 10M'



The following commands will give the current output:

[jean-louis@lt-jeanlouis qemu]$ ./build/qemu-img measure --output json-O qcow2 'json:{"file": {"driver": "file", "filename":"/tmp/test_snap.qcow2"}, "driver": "qcow2", "discard":"unmap","discard-no-unref":true, "backing": {"driver": "qcow2","discard-no-unref":false, "file": {"driver": "file", "filename":"/tmp/test.qcow2"}, "backing": null}}'

{
    "bitmaps": 0,
    "required": 71630848,
    "fully-allocated": 134545408
}

[jean-louis@lt-jeanlouis qemu]$ ./build/qemu-img measure --output json-O qcow2 /tmp/test_snap.qcow2

{
    "bitmaps": 0,
    "required": 71630848,
    "fully-allocated": 134545408
}

[jean-louis@lt-jeanlouis qemu]$ ./build/qemu-img measure --output json-O qcow2 'json:{"file": {"driver": "file", "filename":"/tmp/test_snap.qcow2"}, "driver": "qcow2", "backing": {"driver":"qcow2", "file": {"driver": "file", "filename": "/tmp/test.qcow2"},"backing": null}}'

{
    "bitmaps": 0,
    "required": 71630848,
    "fully-allocated": 134545408
}

Cause it will not take into account the discard-no-unref flag. And willgive the output like you have in the current version.



But when running measure with the following options:

./build/qemu-img measure --output json -O qcow2 'json:{"file":{"driver": "file", "filename": "/tmp/test_snap.qcow2"}, "driver":"qcow2", "discard":"unmap", "discard-no-unref":true, "backing":{"driver": "qcow2", "discard-no-unref":true, "file": {"driver": "file","filename": "/tmp/test.qcow2"}, "backing": null}}'


It will give a bigger required size:
{
    "bitmaps": 0,
    "required": 88408064,
    "fully-allocated": 134545408
}

Why? if a block has already been allocated (either with data or containsan allocated ZERO block), we want to include its size in the calculation.Because with discard-no-unref, an allocated block will not be reused forsome other cluster, so it's not available for data in the snapshot layer.So if the cluster was not yet allocated in the destination image, a newcluster will need to be allocated to fit the new data from the snapshotlayer.

So to me that means if 'measure' is supposed to give reliable data onthe commit case, it needs to be extended. Best thing I can come upwith off the top of my head would be to add an option e.g.'commit=<target-node-name>', so we (A) that we’re looking at a commitand not a convert, and (B) we know what data will be collapsed intowhich image and where we need to check for discard-no-unref.

I think that is what can be achieved by using the json argument. Causethere we can specify the target with its flags.And it's then the responsibility of oVirt (or whatever other tool), topass the correct flags.


Hanna

Thanks for the review

Jean-Louis

If for example you have a snapshot image with a base, and you do a
discard within the snapshot, it will be ZERO and ALLOCATED, but without
host offset.
Now if we commit this snapshot, and the clusters in the base image have
a host offset, the clusters will only be set to ZERO, but the hostoffset
will not be cleared.
Therefor non-data clusters in the top image need to check the
base to see if space will be freed or not, to have a correct measure
output.

Bug-Url: https://gitlab.com/qemu-project/qemu/-/issues/2369
Signed-off-by: Jean-Louis Dupond <jean-lo...@dupond.be>
---
  block/qcow2.c | 32 +++++++++++++++++++++++++++++---
  1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 956128b409..50354e5b98 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -5163,9 +5163,16 @@ static BlockMeasureInfo*qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
          } else {
              int64_t offset;
              int64_t pnum = 0;
+            BlockDriverState *parent = bdrv_filter_or_cow_bs(in_bs);
+            BDRVQcow2State *s = NULL;
+
+            if (parent) {
+                s = parent->opaque;
+            }
                for (offset = 0; offset < ssize; offset += pnum) {
                  int ret;
+                int retp = 0;
                    ret = bdrv_block_status_above(in_bs, NULL, offset,
ssize - offset,&pnum, NULL,@@ -5176,10 +5183,29 @@ static BlockMeasureInfo*qcow2_measure(QemuOpts *opts, BlockDriverState *in_bs,
                      goto err;
                  }
  -                if (ret & BDRV_BLOCK_ZERO) {
+ /* If we have a parent in the chain and the currentblock is not data,+ * then we want to check the allocation state of theparent block.+ * If it has a valid offset, then we want to includeit into+ * the calculation, cause blocks with an offset willnot be freed when+ * committing the top into base withdiscard-no-unref enabled.
+                 */
+ if (parent && s->discard_no_unref && !(ret &BDRV_BLOCK_DATA)) {
+                        int64_t pnum_parent = 0;
+ retp = bdrv_block_status_above(parent, NULL,offset,+ ssize - offset,&pnum_parent, NULL,
+                                              NULL);
+ /* If the parent continuous block issmaller, use that pnum,+ * so the next iteration starts with thesmallest offset.
+                         */
+                        if (pnum_parent < pnum) {
+                            pnum = pnum_parent;
+                        }
+                }
+ if (ret & BDRV_BLOCK_ZERO && !parent && !(parent &&s->discard_no_unref)) { /* Skip zero regions (safe with no backingfile) */- } else if ((ret & (BDRV_BLOCK_DATA |BDRV_BLOCK_ALLOCATED)) ==
-                           (BDRV_BLOCK_DATA | BDRV_BLOCK_ALLOCATED)) {
+ } else if (((ret & (BDRV_BLOCK_DATA |BDRV_BLOCK_ALLOCATED)) ==+ (BDRV_BLOCK_DATA |BDRV_BLOCK_ALLOCATED)) ||
+                           (retp & BDRV_BLOCK_OFFSET_VALID)) {
/* Extend pnum to end of cluster for nextiteration */ pnum = ROUND_UP(offset + pnum, cluster_size) -offset;

Re: [PATCH v2 1/2] qcow2: handle discard-no-unref in measure

Reply via email to