On Wed, Jun 9, 2021 at 9:01 PM Eric Blake <ebl...@redhat.com> wrote: > > When trying to reconstruct a qcow2 chain using information provided > over NBD, ovirt had been relying on an unsafe assumption that any > portion of the qcow2 file advertised as sparse would defer to the > backing image; this worked with what qemu 5.2 reports for a qcow2 BSD > loaded with "backing":null. However, in 6.0, commit 0da9856851 (nbd: > server: Report holes for raw images) also had a side-effect of > reporting unallocated zero clusters in qcow2 files as sparse. This > change is correct from the NBD spec perspective (advertising bits has > always been optional based on how much information the server has > available, and should only be used to optimize behavior when a bit is > set, while not assuming semantics merely because a bit is clear), but > means that a qcow2 file that uses an unallocated zero cluster to > override a backing file now shows up as sparse over NBD, and causes > ovirt to fail to reproduce that cluster (ie. ovirt was assuming it > only had to write clusters where the bit was clear, and the 6.0 > behavior change shows the flaw in that assumption). > > The correct fix is for ovirt to additionally use the > qemu:allocation-depth metadata context added in 5.2: after all, the > actual determination for what is needed to recreate a qcow2 file is > not whether a cluster is sparse, but whether the allocation-depth > shows the cluster to be local. But reproducing an image is more > efficient when handling known-zero clusters, which means that ovirt > has to track both base:allocation and qemu:allocation-depth metadata > contexts simultaneously. While NBD_CMD_BLOCK_STATUS is just fine > sending back information for two contexts in parallel, it comes with > some bookkeeping overhead at the client side: the two contexts need > not report the same length of replies, and it involves more network > traffic.
Since this change is not simple, and the chance that we also get the dirty bitmap included in the result seems to be very low, I decided to check the direction of merging multiple extents. I started with merging "base:allocation" and "qemu:dirty-bitmap:xxx" since we already have both. It was not hard to do, although it is not completely tested yet. Here is the merging code: https://gerrit.ovirt.org/c/ovirt-imageio/+/115216/1/daemon/ovirt_imageio/_internal/nbdutil.py To make merging easy and safe, we map the NBD_STATE_DIRTY bit to a private bit so it cannot clash with the NBD_STATE_HOLE bit: https://gerrit.ovirt.org/c/ovirt-imageio/+/115215/1/daemon/ovirt_imageio/_internal/nbd.py Here is a functional test using qemu-nbd showing that it works: https://gerrit.ovirt.org/c/ovirt-imageio/+/115216/1/daemon/test/client_test.py I'll try to use "qemu:allocation-depth" in a similar way next week, probably mapping depth > 0 to EXTENT_EXISTS, to use when reporting holes in single qcow2 images. If this is successful, we can start using this in the next ovirt release, and we don't need "qemu:joint-allocation". Nir