-manager] RFC : add lvmqcow2 storage support

Dominik Csapak Wed, 28 Aug 2024 05:54:11 -0700

On 8/26/24 13:00, Alexandre Derumier via pve-devel wrote:


This patch series add support for a new lvmqcow2 storage format.

Currently, we can't do snasphot && thin provisionning on shared block devices 
because
lvm thin can't share his metavolume. I have a lot of onprem vmware customers
where it's really blocking the proxmox migration. (and they are looking for 
ovirt/oracle
virtualisation where it's working fine).

It's possible to format a block device without filesystem with qcow2 format 
directly.
This is used by redhat rhev/ovirt since almost 10year in their vsdm daemon.

For thin provisiniong or to handle extra size of snapshot, we need to be able 
to resize
the lvm volume dynamically.
The volume is increased by chunk of 1GB by default (can be changed).
Qemu implement events to sent an alert when the write usage is reaching a 
threshold.
(Threshold is 50% of last chunk, so when vm have 500MB free)

The resize is async (around 2s), so user need to choose a correct chunk size && 
threshold,
if the storage is really fast (nvme for example, where you can write more than 
500MB in 2ss)

If the resize is not enough fast, the vm will pause in io-error.
pvestatd is looking for this error, and try to extend again if needed and 
resume the vm



Hi,

just my personal opinion, maybe you also want to wait for more feedback from 
somebody else...
(also i just glanced over the patches, so correct me if I'm wrong)

i see some problems with this approach (some are maybe fixable, some probably 
not?)

* as you mentioned, if the storage is fast enough you have a runaway VM
  this is IMHO not acceptable, as that leads to VMs that are completely blocked 
and
  can't do anything. I fear this will generate many support calls why their 
guests
  are stopped/hanging...

* the code says containers are supported (rootdir => 1) but i don't see how?
  there is AFAICS no code to handle them in any way...
  (maybe just falsely copied?)

* you lock the local blockextend call, but give it a timeout of 60 seconds.
  what if that timeout expires? the vm again gets completely blocked until it's
  resized by pvestatd

* IMHO pvestatd is the wrong place to make such a call. It's already doing much
  stuff in a way where a single storage operation blocks many other things
  (metrics, storage/vm status, ballooning, etc..)

  cramming another thing in there seems wrong and will only lead to even more 
people
  complaining about the pvestatd not working, only in this case the vms
  will be in an io-error state indefinitely then.

  I'd rather make a separate daemon/program, or somehow integrate it into
  qmeventd (but then it would have to become multi threaded/processes/etc.
  to not block it's other purposes)

* there is no cluster locking?
  you only mention

  ---8<---
  #don't·use·global·cluster·lock·here,·use·on·native·local·lvm·lock
  --->8---

  but don't configure any lock? (AFAIR lvm cluster locking needs additional
  configuration/daemons?)

  this *will* lead to errors if multiple VMs on different hosts try
  to resize at the same time.

  even with cluster locking, this will very soon lead to contention, since
  storage operations are inherently expensive, e.g. if i have
  10-100 VMs wanting to resize at the same time, some of them will run
  into a timeout or at least into the blocking state.

  That does not even need much IO, just bad luck when multiple VMs go
  over the threshold within a short time.

All in all, I'm not really sure if the gain (snapshots on shared LVM) is worth
the potential cost in maintenance, support and customer dissatisfaction with
stalled/blocked VMs.

Generally a better approach could be for your customers to use some
kind of shared filesystem (GFS2/OCFS/?). I know those are not really
tested or supported by us, but i would hope that they scale and behave
better than qcow2-on-lvm-with-dynamic-resize.

best regards
Dominik


_______________________________________________
pve-devel mailing list
pve-devel@lists.proxmox.com
https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel

Re: [pve-devel] [PATCH SERIES storage/qemu-server/-manager] RFC : add lvmqcow2 storage support

Reply via email to