> Alexandre Derumier via pve-devel <pve-devel@lists.proxmox.com> hat am 
> 26.08.2024 13:00 CEST geschrieben:
> This patch series add support for a new lvmqcow2 storage format.
> 
> Currently, we can't do snasphot && thin provisionning on shared block devices 
> because
> lvm thin can't share his metavolume. I have a lot of onprem vmware customers
> where it's really blocking the proxmox migration. (and they are looking for 
> ovirt/oracle
> virtualisation where it's working fine).
> 
> It's possible to format a block device without filesystem with qcow2 format 
> directly.
> This is used by redhat rhev/ovirt since almost 10year in their vsdm daemon.
> 
> For thin provisiniong or to handle extra size of snapshot, we need to be able 
> to resize
> the lvm volume dynamically.
> The volume is increased by chunk of 1GB by default (can be changed).
> Qemu implement events to sent an alert when the write usage is reaching a 
> threshold.
> (Threshold is 50% of last chunk, so when vm have 500MB free)
> 
> The resize is async (around 2s), so user need to choose a correct chunk size 
> && threshold,
> if the storage is really fast (nvme for example, where you can write more 
> than 500MB in 2ss)
> 
> If the resize is not enough fast, the vm will pause in io-error.
> pvestatd is looking for this error, and try to extend again if needed and 
> resume the vm

I agree with Dominik about the downsides of this approach.

We had a brief chat this morning and came up with a possible alternative that 
would still allow snapshots (even if thin-provisioning would be out of scope):

- allocate the volume with the full size and put a fully pre-allocated qcow2 
file on it
- no need to monitor regular guest I/O, it's guaranteed that the qcow2 file can 
be fully written
- when creating a snapshot
-- check the actual usage of the qcow2 file
-- extend the underlying volume so that the total size is current usage + size 
exposed to the guest
-- create the actual (qcwo2-internal) snapshot
- still no need to monitor guest I/O, the underlying volume should be big 
enough to overwrite all data

this would give us effectively the same semantics as thick-provisioned zvols, 
which also always reserve enough space at snapshot creation time to allow a 
full overwrite of the whole zvol. if the underlying volume cannot be extended 
by the required space, snapshot creation would fail.

some open questions:
- do we actually get enough information about space usage out of the qcow2 file 
(I think so, but haven't checked in detail)
- is there a way to compact/shrink either when removing snapshots, or as 
(potentially expensive) standalone action (worst case, compact by copying the 
whole disk?)

another, less involved approach would be to over-allocate the volume to provide 
a fixed, limited amount of slack for snapshots (e.g., "allocate 50% extra space 
for snapshots" when creating a guest volume) - but that has all the usual 
downsides of thin-provisioning (the guest is lied to about the disk size, and 
can run into weird error states when space runs out) and is less flexible.

what do you think about the above approaches?


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Reply via email to