On Wed, Sep 30, 2020 at 1:49 PM Tomáš Golembiovský <[email protected]> wrote: > > Hi, > > currently, when we run virt-sparsify on VM or user runs VM with discard > enabled and when the disk is on block storage in qcow, the results are > not reflected in oVirt. The blocks get discarded, storage can reuse them > and reports correct allocation statistics, but oVirt does not. In oVirt > one can still see the original allocation for disk and storage domain as > it was before blocks were discarded. This is super-confusing to the > users because when they check after running virt-sparsify and see the > same values they think sparsification is not working. Which is not true.
This may be documentation issue. This is a known limitation of oVirt thin provisioned storage. We allocate space as needed, but we release the space only when a volume is deleted. > It all seems to be because of our LVM layout that we have on storage > domain. The feature page for discard [1] suggests it could be solved by > running lvreduce. But this does not seem to be true. When blocks are > discarded the QCOW does not necessarily change its apparent size, the > blocks don't have to be removed from the end of the disk. So running > lvreduce is likely to remove valuable data. We have an API to (safely) reduce a volume to optimal size: http://ovirt.github.io/ovirt-engine-api-model/master/#services/disk/methods/reduce Reducing images depends on qcow2 image-end-offset. We can tell which is the highest offset used by inactive disk: https://github.com/oVirt/vdsm/blob/24f646383acb615b090078fc7aeddaf7097afe57/lib/vdsm/storage/blockVolume.py#L403 and reduce the logical volume to this size. But this will not works since qcow2 image-end-offset is not decreased by virt-sparsify --in-place So it is true that sparsify releases unused space on storage level, but it does not decrease the qcow2 image allocation, so we cannot reduce the logical volumes. > At the moment I don't see how we could achieve the correct values. If > anyone has any idea feel free to entertain me. The only option seems to > be to switch to LVM thin pools. Do we have any plans on doing that? No, thin pools do not support clustering, this can be used only on a single host. oVirt lvm based volumes are accessed on multiple hosts at the same time. Here is an example sparisfy test showing the issue: Before writing data to new disk guest: # df -h /data Filesystem Size Used Avail Use% Mounted on /dev/sda1 10G 104M 9.9G 2% /data storage: $ ls -lhs /home/target/2/00 2.1G -rw-r--r--. 1 root root 100G Oct 2 00:57 /home/target/2/00 host: # qemu-img info /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 image: /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 file format: qcow2 virtual size: 10 GiB (10737418240 bytes) disk size: 0 B cluster_size: 65536 Format specific information: compat: 1.1 compression type: zlib lazy refcounts: false refcount bits: 16 corrupt: false # qemu-img check /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 No errors were found on the image. 168/163840 = 0.10% allocated, 0.60% fragmented, 0.00% compressed clusters Image end offset: 12582912 After writing 5g file to file system on this disk in the guest: guest: $ dd if=/dev/zero bs=8M count=640 of=/data/test oflag=direct conv=fsync status=progress # df -h /data Filesystem Size Used Avail Use% Mounted on /dev/sda1 10G 5.2G 4.9G 52% /data storage: $ ls -lhs /home/target/2/00 7.1G -rw-r--r--. 1 root root 100G Oct 2 01:06 /home/target/2/00 host: # qemu-img check /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 No errors were found on the image. 82088/163840 = 50.10% allocated, 5.77% fragmented, 0.00% compressed clusters Image end offset: 5381423104 After deleting the 5g file: guest: # df -h /data Filesystem Size Used Avail Use% Mounted on /dev/sda1 10G 104M 9.9G 2% /data storage: $ ls -lhs /home/target/2/00 7.1G -rw-r--r--. 1 root root 100G Oct 2 01:12 /home/target/2/00 host: # qemu-img check /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 No errors were found on the image. 82088/163840 = 50.10% allocated, 5.77% fragmented, 0.00% compressed clusters Image end offset: 5381423104 After sparsifying disk: storage: $ qemu-img check /var/tmp/download.qcow2 No errors were found on the image. 170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters Image end offset: 11927552 $ ls -lhs /home/target/2/00 2.1G -rw-r--r--. 1 root root 100G Oct 2 01:14 /home/target/2/00 host: # qemu-img check /dev/27f2b637-ffb1-48f9-8f68-63ed227392b9/42cf66df-43ad-4cfa-ab57-a943516155d1 No errors were found on the image. 170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters Image end offset: 4822138880 Allocation decreased from 50% to 0.1%, but image end offset decreased only from 5381423104 to 4822138880 (-10.5%). I don't know if this is a behavior change in virt-sparsify or qemu or it was always like that. We had an old and unused sparsifyVolume API in vdsm before 4.4. This did not use --in-place and was very complicated because of this. But I think it would work in this case, since qemu-img convert will drop the unallocated areas. For example after downloading the sparsified disk, we get: $ qemu-img check download.qcow2 No errors were found on the image. 170/163840 = 0.10% allocated, 0.59% fragmented, 0.00% compressed clusters Image end offset: 11927552 Kevin, is this the expected behavior or a bug in qemu? The disk I tested is a single qcow2 image without the backing file, so theoretically qemu can deallocate all the discarded clusters. Nir
