Whether the hypervisor snapshot happens depends on whether the 'quiesce' option is specified with the snapshot request. If a user doesn't care about the consistency of their backup, then the hypervisor snapshot/quiesce step can be skipped altogether. This of course is not the case if the default provider is being used, in which case a hypervisor snapshot is the only way of creating a backup since it can't be offloaded to the storage driver.
-- Chris Suich chris.su...@netapp.com NetApp Software Engineer Data Center Platforms – Cloud Solutions Citrix, Cisco & Red Hat On Oct 8, 2013, at 4:57 PM, Darren Shepherd <darren.s.sheph...@gmail.com> wrote: > Who is going to decide whether the hypervisor snapshot should actually > happen or not? Or how? > > Darren > > On Tue, Oct 8, 2013 at 12:38 PM, SuichII, Christopher > <chris.su...@netapp.com> wrote: >> >> -- >> Chris Suich >> chris.su...@netapp.com >> NetApp Software Engineer >> Data Center Platforms – Cloud Solutions >> Citrix, Cisco & Red Hat >> >> On Oct 8, 2013, at 2:24 PM, Darren Shepherd <darren.s.sheph...@gmail.com> >> wrote: >> >>> So in the implementation, when we say "quiesce" is that actually being >>> implemented as a VM snapshot (memory and disk). And then when you say >>> "unquiesce" you are talking about deleting the VM snapshot? >> >> If the VM snapshot is not going to the hypervisor, then yes, it will >> actually be a hypervisor snapshot. Just to be clear, the unquiesce is not >> quite a delete - it is a collapse of the VM snapshot and the active VM back >> into one file. >> >>> >>> In NetApp, what are you snapshotting? The whole netapp volume (I >>> don't know the correct term), a file on NFS, an iscsi volume? I don't >>> know a whole heck of a lot about the netapp snapshot capabilities. >> >> Essentially we are using internal APIs to create file level backups - don't >> worry too much about the terminology. >> >>> >>> I know storage solutions can snapshot better and faster than >>> hypervisors can with COW files. I've personally just been always >>> perplexed on whats the best way to implement it. For storage >>> solutions that are block based, its really easy to have the storage >>> doing the snapshot. For shared file systems, like NFS, its seems way >>> more complicated as you don't want to snapshot the entire filesystem >>> in order to snapshot one file. >> >> With filesystems like NFS, things are certainly more complicated, but that >> is taken care of by our controller's operating system, Data ONTAP, and we >> simply use APIs to communicate with it. >> >>> >>> Darren >>> >>> On Tue, Oct 8, 2013 at 11:10 AM, SuichII, Christopher >>> <chris.su...@netapp.com> wrote: >>>> I can comment on the second half. >>>> >>>> Through storage operations, storage providers can create backups much >>>> faster than hypervisors and over time, their snapshots are more efficient >>>> than the snapshot chains that hypervisors create. It is true that a VM >>>> snapshot taken at the storage level is slightly different as it would be >>>> psuedo-quiesced, not have it's memory snapshotted. This is accomplished >>>> through hypervisor snapshots: >>>> >>>> 1) VM snapshot request (lets say VM 'A' >>>> 2) Create hypervisor snapshot (optional) >>>> -VM 'A' is snapshotted, creating active VM 'A*' >>>> -All disk traffic now goes to VM 'A*' and A is a snapshot of 'A*' >>>> 3) Storage driver(s) take snapshots of each volume >>>> 4) Undo hypervisor snapshot (optional) >>>> -VM snapshot 'A' is rolled back into VM 'A*' so the hypervisor snapshot no >>>> longer exists >>>> >>>> Now, a couple notes: >>>> -The reason this is optional is that not all users necessarily care about >>>> the memory or disk consistency of their VMs and would prefer faster >>>> snapshots to consistency. >>>> -Preemptively, yes, we are actually taking hypervisor snapshots which >>>> means there isn't actually a performance of taking storage snapshots when >>>> quiescing the VM. However, the performance gain will come both during >>>> restoring the VM and during normal operations as described above. >>>> >>>> Although you can think of it as a poor man's VM snapshot, I would think of >>>> it more as a consistent multi-volume snapshot. Again, the difference being >>>> that this snapshot was not truly quiesced like a hypervisor snapshot would >>>> be. >>>> >>>> -- >>>> Chris Suich >>>> chris.su...@netapp.com >>>> NetApp Software Engineer >>>> Data Center Platforms – Cloud Solutions >>>> Citrix, Cisco & Red Hat >>>> >>>> On Oct 8, 2013, at 1:47 PM, Darren Shepherd <darren.s.sheph...@gmail.com> >>>> wrote: >>>> >>>>> My only comment is that having the return type as boolean and using to >>>>> that indicate quiesce behaviour seems obscure and will probably lead >>>>> to a problem later. Your basically saying the result of the >>>>> takeVMSnapshot will only ever need to communicate back whether >>>>> unquiesce needs to happen. Maybe some result object would be more >>>>> extensible. >>>>> >>>>> Actually, I think I have more comments. This seems a bit odd to me. >>>>> Why would a storage driver in ACS implement a VM snapshot >>>>> functionality? VM snapshot is a really a hypervisor orchestrated >>>>> operation. So it seems like were trying to implement a poor mans VM >>>>> snapshot. Maybe if I understood what NetApp was trying to do it would >>>>> make more sense, but its all odd. To do a proper VM snapshot you need >>>>> to snapshot memory and disk at the exact same time. How are we going >>>>> to do that if ACS is orchestrating the VM snapshot and delegating to >>>>> storage providers. Its not like you are going to pause the VM.... or >>>>> are you? >>>>> >>>>> Darren >>>>> >>>>> On Mon, Oct 7, 2013 at 11:59 AM, Edison Su <edison...@citrix.com> wrote: >>>>>> I created a design document page at >>>>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Pluggable+VM+snapshot+related+operations, >>>>>> feel free to add items on it. >>>>>> And a new branch "pluggable_vm_snapshot" is created. >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: SuichII, Christopher [mailto:chris.su...@netapp.com] >>>>>>> Sent: Monday, October 07, 2013 10:02 AM >>>>>>> To: <dev@cloudstack.apache.org> >>>>>>> Subject: Re: [DISCUSS] Pluggable VM snapshot related operations? >>>>>>> >>>>>>> I'm a fan of option 2 - this gives us the most flexibility (as you >>>>>>> stated). The >>>>>>> option is given to completely override the way VM snapshots work AND >>>>>>> storage providers are given to opportunity to work within the default VM >>>>>>> snapshot workflow. >>>>>>> >>>>>>> I believe this option should satisfy your concern, Mike. The snapshot >>>>>>> and >>>>>>> quiesce strategy would be in charge of communicating with the >>>>>>> hypervisor. >>>>>>> Storage providers should be able to leverage the default strategies and >>>>>>> simply perform the storage operations. >>>>>>> >>>>>>> I don't think it should be much of an issue that new method to the >>>>>>> storage >>>>>>> driver interface may not apply to everyone. In fact, that is already >>>>>>> the case. >>>>>>> Some methods such as un/maintain(), attachToXXX() and takeSnapshot() are >>>>>>> already not implemented by every driver - they just return false when >>>>>>> asked >>>>>>> if they can handle the operation. >>>>>>> >>>>>>> -- >>>>>>> Chris Suich >>>>>>> chris.su...@netapp.com >>>>>>> NetApp Software Engineer >>>>>>> Data Center Platforms - Cloud Solutions >>>>>>> Citrix, Cisco & Red Hat >>>>>>> >>>>>>> On Oct 5, 2013, at 12:11 AM, Mike Tutkowski >>>>>>> <mike.tutkow...@solidfire.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Well, my first thought on this is that the storage driver should not >>>>>>>> be telling the hypervisor to do anything. It should be responsible for >>>>>>>> creating/deleting volumes, snapshots, etc. on its storage system only. >>>>>>>> >>>>>>>> >>>>>>>> On Fri, Oct 4, 2013 at 5:57 PM, Edison Su <edison...@citrix.com> wrote: >>>>>>>> >>>>>>>>> In 4.2, we added VM snapshot for Vmware/Xenserver. The current >>>>>>>>> workflow will be like the following: >>>>>>>>> createVMSnapshot api -> VMSnapshotManagerImpl: creatVMSnapshot -> >>>>>>>>> send CreateVMSnapshotCommand to hypervisor to create vm snapshot. >>>>>>>>> >>>>>>>>> If anybody wants to change the workflow, then need to either change >>>>>>>>> VMSnapshotManagerImpl directly or subclass VMSnapshotManagerImpl. >>>>>>>>> Both are not the ideal choice, as VMSnapshotManagerImpl should be >>>>>>>>> able to handle different ways to take vm snapshot, instead of hard >>>>>>>>> code. >>>>>>>>> >>>>>>>>> The requirements for the pluggable VM snapshot coming from: >>>>>>>>> Storage vendor may have their optimization, such as NetApp. >>>>>>>>> VM snapshot can be implemented in a totally different way(For >>>>>>>>> example, I could just send a command to guest VM, to tell my >>>>>>>>> application to flush disk and hold disk write, then come to >>>>>>>>> hypervisor to >>>>>>> take a volume snapshot). >>>>>>>>> >>>>>>>>> If we agree on enable pluggable VM snapshot, then we can move on >>>>>>>>> discuss how to implement it. >>>>>>>>> >>>>>>>>> The possible options: >>>>>>>>> 1. coarse grained interface. Add a VMSnapshotStrategy interface, >>>>>>>>> which has the following interfaces: >>>>>>>>> VMSnapshot takeVMSnapshot(VMSnapshot vmSnapshot); >>>>>>>>> Boolean revertVMSnapshot(VMSnapshot vmSnapshot); >>>>>>>>> Boolean DeleteVMSnapshot(VMSnapshot vmSnapshot); >>>>>>>>> >>>>>>>>> The work flow will be: createVMSnapshot api -> >>>>>>> VMSnapshotManagerImpl: >>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot >>>>>>>>> VMSnapshotManagerImpl will manage VM state, do the sanity check, >>>>>>>>> then will handle over to VMSnapshotStrategy. >>>>>>>>> In VMSnapshotStrategy implementation, it may just send a >>>>>>>>> Create/revert/delete VMSnapshotCommand to hypervisor host, or do >>>>>>>>> anything special operations. >>>>>>>>> >>>>>>>>> 2. fine-grained interface. Not only add a VMSnapshotStrategy >>>>>>>>> interface, but also add certain methods on the storage driver. >>>>>>>>> The VMSnapshotStrategy interface will be the same as option 1. >>>>>>>>> Will add the following methods on storage driver: >>>>>>>>> /* volumesBelongToVM is the list of volumes of the VM that created >>>>>>>>> on this storage, storage vendor can either take one snapshot for this >>>>>>>>> volumes in one shot, or take snapshot for each volume separately >>>>>>>>> The pre-condition: vm is unquiesced. >>>>>>>>> It will return a Boolean to indicate, do need unquiesce vm or not. >>>>>>>>> In the default storage driver, it will return false. >>>>>>>>> */ >>>>>>>>> boolean takeVMSnapshot(List<VolumeInfo> volumesBelongToVM, >>>>>>>>> VMSnapshot vmSnapshot); >>>>>>>>> Boolean revertVMSnapshot(List<VolumeInfo> volumesBelongToVM, >>>>>>>>> VMSnapshot vmSnapshot); >>>>>>>>> Boolean deleteVMSnapshot(List<VolumeInfo> volumesBelongToVM, >>>>>>>>> VMSnapshot vmSNapshot); >>>>>>>>> >>>>>>>>> The work flow will be: createVMSnapshot api -> >>>>>>> VMSnapshotManagerImpl: >>>>>>>>> creatVMSnapshot -> VMSnapshotStrategy: takeVMSnapshot -> storage >>>>>>>>> driver:takeVMSnapshot In the implementation of VMSnapshotStrategy's >>>>>>>>> takeVMSnapshot, the pseudo code looks like: >>>>>>>>> HypervisorHelper.quiesceVM(vm); >>>>>>>>> val volumes = vm.getVolumes(); >>>>>>>>> val maps = new Map[driver, list[VolumeInfo]](); >>>>>>>>> Volumes.foreach(volume => maps.put(volume.getDriver, volume :: >>>>>>>>> maps.get(volume.getdriver()))) >>>>>>>>> val needUnquiesce = true; >>>>>>>>> maps.foreach((driver, volumes) => needUnquiesce = >>>>>>>>> needUnquiesce && driver.takeVMSnapshot(volumes)) >>>>>>>>> if (needUnquiesce ) { >>>>>>>>> HypervisorHelper.unquiesce(vm); >>>>>>>>> } >>>>>>>>> >>>>>>>>> By default, the quiesceVM in HypervisorHelper will actually take vm >>>>>>>>> snapshot through hypervisor. >>>>>>>>> Does above logic makes senesce? >>>>>>>>> >>>>>>>>> The pros of option 1 is that: it's simple, no need to change storage >>>>>>>>> driver interfaces. The cons is that each storage vendor need to >>>>>>>>> implement a strategy, maybe they will do the same thing. >>>>>>>>> The pros of option 2 is that, storage driver won't need to worry >>>>>>>>> about how to quiesce/unquiesce vm. The cons is that, it will add >>>>>>>>> these methods on each storage drivers, so it assumes that this work >>>>>>>>> flow will work for everybody. >>>>>>>>> >>>>>>>>> So which option we should take? Or if you have other options, please >>>>>>>>> let's know. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> *Mike Tutkowski* >>>>>>>> *Senior CloudStack Developer, SolidFire Inc.* >>>>>>>> e: mike.tutkow...@solidfire.com >>>>>>>> o: 303.746.7302 >>>>>>>> Advancing the way the world uses the >>>>>>>> cloud<http://solidfire.com/solution/overview/?video=play> >>>>>>>> *(tm)* >>>>>> >>>> >>