Re: [Qemu-devel] [PATCH 3/3] qapi: Introduce blockdev-query-group-snapshot-failure

Jeff Cody Tue, 21 Feb 2012 06:11:58 -0800

On 02/20/2012 12:48 PM, Eric Blake wrote:
> On 02/20/2012 10:31 AM, Jeff Cody wrote:
>> In the case of a failure in a group snapshot, it is possible for
>> multiple file image failures to occur - for instance, failure of
>> an original snapshot, and then failure of one or more of the
>> attempted reopens of the original.
>>
>> Knowing all of the file images which failed could be useful or
>> critical information, so this command returns a list of strings
>> containing the filenames of all failures from the last
>> invocation of blockdev-group-snapshot-sync.
> 
> Meta-question:
> 
> Suppose that the guest is running when we issue
> blockdev-group-snapshot-sync - in that case, qemu is responsible for
> pausing and then resuming the guest.  On success, this makes sense.  But
> what happens on failure?


The guest is not paused in blockdev-group-snapshot-sync; I don't think
that qemu should enforce pause/resume in the live snapshot commands.

> 
> If we only fail at creating one snapshot, but successfully roll back the
> rest of the set, should the guest be resumed (as if the command had
> never been attempted), or should the guest be left paused?
> 
> On the other hand, if we fail at creating one snapshot, as well as fail
> at rolling back, then that argues that we _cannot_ resume the guest,
> because we no longer have a block device open.

Is that really true, though?  Depending on what drive failed, the guest
may still be runnable.  It would be roughly equivalent to the guest as a
drive failure; a bad event, but not always fatal.

But, I think v2 of the patch may make this moot - I was talking with
Kevin, and he had some good ideas on how to do this without requiring a
close & reopen in the case of the snapshot failure; which means that we
shouldn't have to worry about the second scenario.  I am going to
incorporate those changes into v2.

> 
> This policy needs to be documented in one (or both) of the two new
> monitor commands, and we probably ought to make sure that if the guest
> is left paused where it had originally started as running, then an
> appropriate event is also emitted.

I agree, the documentation should make it clear what is going on - I
will add that to v2.

> 
> For blockdev-snapshot-sync, libvirt was always pausing qemu before
> issuing the snapshot, then resuming afterwards; but now that we have the
> ability to make the set atomic, I'm debating about whether libvirt still
> needs to pause qemu, or whether it can now rely on qemu doing the right
> things about pausing and resuming as part of the snapshot command.
> 

Again, it doesn't pause automatically, so that is up to libvirt.  The
guest agent is also available to freeze the filesystem, if libvirt wants
to trust it (and it is running); if not, then libvirt can still issue a
pause/resume around the snapshot command (and libvirt may be in a better
position to decide what to do in case of failure, if it has some
knowledge of the drives that failed and how they are used).

Re: [Qemu-devel] [PATCH 3/3] qapi: Introduce blockdev-query-group-snapshot-failure

Reply via email to