On 8/26/24 13:00, Alexandre Derumier via pve-devel wrote:
This patch series add support for a new lvmqcow2 storage format. Currently, we can't do snasphot && thin provisionning on shared block devices because lvm thin can't share his metavolume. I have a lot of onprem vmware customers where it's really blocking the proxmox migration. (and they are looking for ovirt/oracle virtualisation where it's working fine). It's possible to format a block device without filesystem with qcow2 format directly. This is used by redhat rhev/ovirt since almost 10year in their vsdm daemon. For thin provisiniong or to handle extra size of snapshot, we need to be able to resize the lvm volume dynamically. The volume is increased by chunk of 1GB by default (can be changed). Qemu implement events to sent an alert when the write usage is reaching a threshold. (Threshold is 50% of last chunk, so when vm have 500MB free) The resize is async (around 2s), so user need to choose a correct chunk size && threshold, if the storage is really fast (nvme for example, where you can write more than 500MB in 2ss) If the resize is not enough fast, the vm will pause in io-error. pvestatd is looking for this error, and try to extend again if needed and resume the vm
Hi, just my personal opinion, maybe you also want to wait for more feedback from somebody else... (also i just glanced over the patches, so correct me if I'm wrong) i see some problems with this approach (some are maybe fixable, some probably not?) * as you mentioned, if the storage is fast enough you have a runaway VM this is IMHO not acceptable, as that leads to VMs that are completely blocked and can't do anything. I fear this will generate many support calls why their guests are stopped/hanging... * the code says containers are supported (rootdir => 1) but i don't see how? there is AFAICS no code to handle them in any way... (maybe just falsely copied?) * you lock the local blockextend call, but give it a timeout of 60 seconds. what if that timeout expires? the vm again gets completely blocked until it's resized by pvestatd * IMHO pvestatd is the wrong place to make such a call. It's already doing much stuff in a way where a single storage operation blocks many other things (metrics, storage/vm status, ballooning, etc..) cramming another thing in there seems wrong and will only lead to even more people complaining about the pvestatd not working, only in this case the vms will be in an io-error state indefinitely then. I'd rather make a separate daemon/program, or somehow integrate it into qmeventd (but then it would have to become multi threaded/processes/etc. to not block it's other purposes) * there is no cluster locking? you only mention ---8<--- #don't·use·global·cluster·lock·here,·use·on·native·local·lvm·lock --->8--- but don't configure any lock? (AFAIR lvm cluster locking needs additional configuration/daemons?) this *will* lead to errors if multiple VMs on different hosts try to resize at the same time. even with cluster locking, this will very soon lead to contention, since storage operations are inherently expensive, e.g. if i have 10-100 VMs wanting to resize at the same time, some of them will run into a timeout or at least into the blocking state. That does not even need much IO, just bad luck when multiple VMs go over the threshold within a short time. All in all, I'm not really sure if the gain (snapshots on shared LVM) is worth the potential cost in maintenance, support and customer dissatisfaction with stalled/blocked VMs. Generally a better approach could be for your customers to use some kind of shared filesystem (GFS2/OCFS/?). I know those are not really tested or supported by us, but i would hope that they scale and behave better than qcow2-on-lvm-with-dynamic-resize. best regards Dominik _______________________________________________ pve-devel mailing list pve-devel@lists.proxmox.com https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel