On 22.12.2015 17:46, Kevin Wolf wrote: > Enough innocent images have died because users called 'qemu-img snapshot' > while > the VM was still running. Educating the users doesn't seem to be a working > strategy, so this series adds locking to qcow2 that refuses to access the > image > read-write from two processes. > > Eric, this will require a libvirt update to deal with qemu crashes which leave > locked images behind. The simplest thinkable way would be to unconditionally > override the lock in libvirt whenever the option is present. In that case, > libvirt VMs would be protected against concurrent non-libvirt accesses, but > not > the other way round. If you want more than that, libvirt would have to check > somehow if it was its own VM that used the image and left the lock behind. I > imagine that can't be too hard either. > > Also note that this kind of depends on Max's bdrv_close_all() series, but only > in order to pass test case 142. This is not a bug in this series, but a > preexisting one (bs->file can be closed before bs), and it becomes apparent > when qemu fails to unlock an image due to this bug. Max's series fixes this.
Here's another crazy idea around the issue which kind of locking to choose: We could implement both. Our main issue is that we currently do not protect users not using any management stack against wrongly invoking a second qemu/qemu-img instance on a qcow2 image that's already in use. We have to assume that these users will not specify any locking option, so our default choice needs to work here. Both flock() and format locking would work here fine. Then there are users which use old libvirt versions and new qemu versions. I know Eric doesn't care about this, but I do, because if I had a full-blown management stack, I'd rather update qemu for better performance or security fixes or whatever than the rest of the whole stack. Also, if we need coordination between libvirt and qemu, that will delay release of whatever implementation we do, which is not horrible, but not desirable either. Therefore, it would be nice if the default choice would work with libvirt today, which only leaves format locking. Then there are users for whom format locking is not the right choice. They often do have a management layer, so they don't actually need the features provided by a locking mechanism in qemu (libvirt does it anyway), and I do think we can assume them willing to set some runtime options. Therefore, the default does not matter here, but we should implement flock() because once libvirt can cope with qemu invoking that by itself, it may offer them better protection. This is why I think implementing both format and protocol locking may be a good idea. I think we should enable format locking and disable flock() by default, but of course you should be able to override this choice at runtime. Thoughts? Max
signature.asc
Description: OpenPGP digital signature