On Wed, 21 Apr 2010 10:36:29 +0200 Juan Quintela <quint...@redhat.com> wrote:
> QTAILQ_FOREACH(dinfo, &drives, next) { > bs1 = dinfo->bdrv; > if (bdrv_has_snapshot(bs1)) { > > /// We found a device that has snapshots > ret = bdrv_snapshot_goto(bs1, name); > if (ret < 0) { > /// And don't have a snapshot with the name that we wanted > switch(ret) { > case -ENOTSUP: > error_report("%sSnapshots not supported on device '%s'", > bs != bs1 ? "Warning: " : "", > bdrv_get_device_name(bs1)); > break; > case -ENOENT: > error_report("%sCould not find snapshot '%s' on device > '%s'", > bs != bs1 ? "Warning: " : "", > name, bdrv_get_device_name(bs1)); > break; > default: > error_report("%sError %d while activating snapshot on > '%s'", > bs != bs1 ? "Warning: " : "", > ret, bdrv_get_device_name(bs1)); > break; > } > /* fatal on snapshot block device */ > // I think that one inconditional exit with predjuice could be in order here > > // Notice that bdrv_snapshot_goto() modifies the disk, name is as bad as > // you can get. It just open the disk, opens the snapshot, increases > // its counter of users, and makes it available for use after here > // (i.e. loading state, posibly conflicting with previous running > // VM a.k.a. disk corruption. > > if (bs == bs1) > return 0; > > // This error is as bad as it can gets :( We have to load a vmstate, > // and the disk that should have the memory image don't have it. > // This is an error, I just put the wrong nunmber the previous time. > // Notice that this error should be very rare. So, the current code is buggy and if you fix it (by returning -1) you'll get another bug: loadvm will stop the VM for trivial errors like a not found image. How do you plan to fix this? > As stated, I don't think that trying to run the machine at any point > would make any sense. Only case where it is safe to run it is if the > failure is at get_bs_snapshots(), but at that point running the machine > means: Actually, it must not pause the VM when recovery is (clearly) possible, otherwise it's a usability bug for the user Monitor and a possibly serious bug when you don't have human intervention (eg. QMP). > > <something happens> > $ loadvm other_image > Error "other_image" snapshot don't exist. > $ > > running the previous VM looks like something that should be done > explicitely. If the error happened after that get_bs_snapshots(), > We would need a new flag to just refuse to continue. Only valid > operations at that point are other loadvm operations, i.e. our state is > wrong one way or another. It's not clear to me how this flag can help, but anyway, what we need here is: 1. Fail when failure is reported (vs. report a failure and return OK) 2. Don't keep the VM paused when recovery is possible If you can fix that, it's ok to me: I'll drop this and the next patch. Otherwise I'll have to insist on the split.