On 06/07/2013 03:31 PM, Kevin Wolf wrote: > Am 06.06.2013 um 15:09 hat Liu Yuan geschrieben: >> On 06/06/2013 08:46 PM, Kevin Wolf wrote: >>> Am 06.06.2013 um 13:57 hat Liu Yuan geschrieben: >>>> Just call sd_create_branch() to rollback the image is good enough >>>> >>>> Cc: qemu-devel@nongnu.org >>>> Cc: MORITA Kazutaka <morita.kazut...@lab.ntt.co.jp> >>>> Cc: Kevin Wolf <kw...@redhat.com> >>>> Cc: Stefan Hajnoczi <stefa...@redhat.com> >>>> Signed-off-by: Liu Yuan <namei.u...@gmail.com> >>>> --- >>>> block/sheepdog.c | 8 +++++--- >>>> 1 file changed, 5 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/block/sheepdog.c b/block/sheepdog.c >>>> index 94218ac..cb5ca4a 100644 >>>> --- a/block/sheepdog.c >>>> +++ b/block/sheepdog.c >>>> @@ -2072,9 +2072,11 @@ static int sd_snapshot_goto(BlockDriverState *bs, >>>> const char *snapshot_id) >>>> } >>>> >>>> if (!s->inode.vm_state_size) { >>>> - error_report("Invalid snapshot"); >>>> - ret = -ENOENT; >>>> - goto out; >>>> + /* qemu-img asks us to rollback, we need to do it right now */ >>>> + ret = sd_create_branch(s); >>>> + if (ret) { >>>> + goto out; >>>> + } >>>> } >>> >>> I'm not sure how snapshots work internally for Sheepdog, but it seems >>> odd to me that you need to do this only for disk-only snapshots, but not >>> when the snapshot has VM state. (Also, note that 'qemu-img snapshot -a' >>> works on images with a VM state, so the comment doesn't seem to be >>> completely accurate) >>> >>> Would you mind explaining to me how this works in detail? >>> >> >> Hmm, the original code isn't written by me and this snapshot mechanism >> exists since day 0. I just hacks it to work now. So I'll try to explain >> on my understanding. >> >> When we do a snapshot such as 'savedvm' or 'qemu-img snapshot', the >> active vdi is snapshotted and marked as snapshot and a new vdi is >> created as copy-on-write on the previous active vdi, then this new vdi >> becomes active vdi. For e.g, >> >> As1 --> As2 --> A >> >> We take snapshot of vdi A twice, tagged s1 and s2 respectively. I guess >> this is quit similar to qcow2 snapshots, only inode object with a bitmap >> is created. >> >> So when we 'loadvm' or 'qemu-img snapshot -a' to A, current logic just >> handle 'loadvm', that .bdrv_snapshot_goto only reloads inode object, >> that is, for e.g, we 'savevm s1', and mark it as snapshot, the chain >> would like >> >> As1 --> As2 --> A >> | >> v >> just reload As1's inode object >> >> Only when the write comes from VM, we do the following stuff >> - delete active vdi A >> - created a new inode based on the previously reloaded As1's inode > > Thanks, this is the part that I missed. > > I'm not sure however why the actual switch is delayed until the first > write. This seems inconsistent with qcow2 snapshots. > > Do you know if there is a reason why we can't always do this already > during bdrv_snapshot_goto()? >
I think the reason is sd_load_vmstate() need to load vm state objects with the correct inode object. I tried to remove if (!s->inode.vm_state_size) and make sd_create_branch unconditional. This means 'loadvm' command will try to call sd_create_branch() inside sd_snapshot_goto(). But failed with reading the wrong snapshot because the vdi's inode object is changed by sd_create_branch(). >> The chain will look like: >> >> As1 --> As2 >> | >> V >> A >> >> This is how sheepdog handles savevm/loadvm. >> >> So for 'qemu-img snapshot -a', we should create the branch in the >> .bdrv_snapshot_goto. >> >> As you pointed out, we need to consider vm state even for 'snapshot -a', >> so I need to rework the patch 2/2. > > Yes, the presence of VM state is independent from whether you're using > qemu-img or loadvm. And it actually goes both ways: qemu-img can revert > to snapshots that have a VM state, and loadvm can be used with images > that don't have a VM state (if you have multiple images, only one of > them has the VM state). > Seems not true of current code. If I 'loadvm' a snapshot without a vmstate, I'll get 'qemu-system-x86_64: This is a disk-only snapshot. Revert to it offline using qemu-img'. But 'qemu-img snapshot -a' works as you said, it can rollback to the snapshot regardless of vmstate. Also this is a difference to store vmstate for sheepdog images. *Every* snapshot image can have its own vmstate stored in sheepdog cluster. That is, we can have multiple snapshot with its own private vmstate for sheepdog. I think my patch did the correct thing, just rollback the disk state of the snapshot for 'qemu-img snapshot -a'. Anyway, I found a minor issue of 2/2 patch, so I'll resend the set. Thanks, Yuan