On Wed, Dec 23, 2015 at 11:14:12AM +0800, Fam Zheng wrote: > On Tue, 12/22 17:46, Kevin Wolf wrote: > > Enough innocent images have died because users called 'qemu-img snapshot' > > while > > the VM was still running. Educating the users doesn't seem to be a working > > strategy, so this series adds locking to qcow2 that refuses to access the > > image > > read-write from two processes. > > > > Eric, this will require a libvirt update to deal with qemu crashes which > > leave > > locked images behind. The simplest thinkable way would be to unconditionally > > override the lock in libvirt whenever the option is present. In that case, > > libvirt VMs would be protected against concurrent non-libvirt accesses, but > > not > > the other way round. If you want more than that, libvirt would have to check > > somehow if it was its own VM that used the image and left the lock behind. I > > imagine that can't be too hard either. > > The motivation is great, but I'm not sure I like the side-effect that an > unclean shutdown will require a "forced" open, because it makes using qcow2 in > development cumbersome, and like you said, management/user also needs to > handle > this explicitly. This is a bit of a personal preference, but it's strong > enough > that I want to speak up.
Yeah, I am also not really a big fan of locking mechanisms which are not automatically cleaned up on process exit. On the other hand you could say that people who choose to run qemu-img manually are already taking fate into their own hands, and ending up with a dirty image on unclean exit is still miles better than loosing all your data. > As an alternative, can we introduce .bdrv_flock() in protocol drivers, with > similar semantics to flock(2) or lockf(3)? That way all formats can benefit, > and a program crash will automatically drop the lock. FWIW, the libvirt locking daemon (virtlockd) will already attempt to take out locks using fcntl()/lockf() on all disk images associated with a VM. This only protects against two QEMU emulators running at the same time though, and also only if they're using libvirt APIs. So it doesn't protect if someone runs qemu-img manually, or indeed if libvirt runs qemu-img, though we could fairly easily address the latter. A problem with lockf is that it is almost unusable by design, because if you have 2 (or more) file descriptors open against the same file, if you close *any* of the file descriptors it releases all locks on the file, even if the locks were acquired on a different file descriptor than the one being closed :-( This is why we put our locking code into a completely separate process (virtlockd), to guarantee nothing else might accidentally open/close file descriptors on the same file we had locked. A second problem with using flock/lockf() is that on block devices the locks are only scoped to the local host, so if you have shared block storage they locks are not all that useful. To deal with this, virtlockd has the concept of a "lockspace". The default lockspace is associated directly while the disk files, but alternate lockspaces are possible which are indirectly associated. For example, we have lockspaces that are keyed off the SCSI unique volume ID, and the LVM volume UUID, which cna be placed on a shared filesystem. This lets us get cross-host locking even for block storage. We have a future desire to be able to make use of storage native locking mechansisms too such as SCSI reservations. So while QEMU could add a bdrv_lock() driver method, it will have some limitations & implementation complexity (ensuring nothing else in QEMU can ever accidentally open+close the same file that QEMU has locked), though it could offer better protection than we have with libvirt for cases whe e people run qemu-img manually. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|