[Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM

Luiz Capitulino Wed, 21 Apr 2010 07:54:39 -0700

On Wed, 21 Apr 2010 10:36:29 +0200
Juan Quintela <quint...@redhat.com> wrote:


>     QTAILQ_FOREACH(dinfo, &drives, next) {
>         bs1 = dinfo->bdrv;
>         if (bdrv_has_snapshot(bs1)) {
> 
> /// We found a device that has snapshots
>             ret = bdrv_snapshot_goto(bs1, name);
>             if (ret < 0) {
> /// And don't have a snapshot with the name that we wanted
>                 switch(ret) {
>                 case -ENOTSUP:
>                     error_report("%sSnapshots not supported on device '%s'",
>                                  bs != bs1 ? "Warning: " : "",
>                                  bdrv_get_device_name(bs1));
>                     break;
>                 case -ENOENT:
>                     error_report("%sCould not find snapshot '%s' on device 
> '%s'",
>                                  bs != bs1 ? "Warning: " : "",
>                                  name, bdrv_get_device_name(bs1));
>                     break;
>                 default:
>                     error_report("%sError %d while activating snapshot on 
> '%s'",
>                                  bs != bs1 ? "Warning: " : "",
>                                  ret, bdrv_get_device_name(bs1));
>                     break;
>                 }
>                 /* fatal on snapshot block device */
> // I think that one inconditional exit with predjuice could be in order here
> 
> // Notice that bdrv_snapshot_goto() modifies the disk, name is as bad as
> // you can get.  It just open the disk, opens the snapshot, increases
> // its counter of users, and makes it available for use after here
> // (i.e. loading state, posibly conflicting with previous running
> // VM a.k.a. disk corruption.
> 
>                 if (bs == bs1)
>                     return 0;
> 
> // This error is as bad as it can gets :(  We have to load a vmstate,
> // and the disk that should have the memory image don't have it.
> // This is an error, I just put the wrong nunmber the previous time.
> // Notice that this error should be very rare.

 So, the current code is buggy and if you fix it (by returning -1)
you'll get another bug: loadvm will stop the VM for trivial errors
like a not found image.

 How do you plan to fix this?

> As stated, I don't think that trying to run the machine at any point
> would make any sense.  Only case where it is safe to run it is if the
> failure is at get_bs_snapshots(), but at that point running the machine
> means:

 Actually, it must not pause the VM when recovery is (clearly) possible,
otherwise it's a usability bug for the user Monitor and a possibly serious
bug when you don't have human intervention (eg. QMP).

> 
> <something happens>
> $ loadvm other_image
>   Error "other_image" snapshot don't exist.
> $
> 
> running the previous VM looks like something that should be done
> explicitely.  If the error happened after that get_bs_snapshots(),
> We would need a new flag to just refuse to continue.  Only valid
> operations at that point are other loadvm operations, i.e. our state is
> wrong one way or another.

 It's not clear to me how this flag can help, but anyway, what we need
here is:

1. Fail when failure is reported (vs. report a failure and return OK)

2. Don't keep the VM paused when recovery is possible

 If you can fix that, it's ok to me: I'll drop this and the next patch.

 Otherwise I'll have to insist on the split.

[Qemu-devel] Re: [PATCH 04/22] savevm: do_loadvm(): Always resume the VM

Reply via email to