Re: [openstack-dev] Proposal for Raksha, a Data Protection As a Service project

Murali Balcha Thu, 29 Aug 2013 17:40:09 -0700

>>> My question is, would it make sense to add to the current mechanisms in
>>> Nova and Cinder than add the complexity of a new project?
> 
> I think the answer is yes  :)



I meant there is a clear need for Raksha project. :)

Thanks,
Murali Balcha

On Aug 29, 2013, at 7:45 PM, "Murali Balcha" <murali.bal...@triliodata.com> 
wrote:

> 
> ________________________________________
>>> From: Ronen Kat <ronen...@il.ibm.com>
>>> Sen: Thursday, August 29, 2013 2:55 PM
>>> To: openstack-dev@lists.openstack.org; openstack-...@lists.launchpad.net
>>> Subject: Re: [openstack-dev] Proposal for Raksha, a Data Protection As a 
>>> Service project
> 
>>> Hi Murali,
> 
>>> I think the idea to provide enhanced data protection in OpenStack is a
>>> great idea, and I have been thinking about  backup in OpenStack for a while
>>> now.
>>> I just not sure a new project is the only way to do.
> 
>>> (as disclosure, I contributed code to enable IBM TSM as a Cinder backup
>>> driver)
> 
> Hi Kat,
> Consider the following use cases that Raksha will addresses. I will discuss 
> from simple to complex use case and then address your specific questions with 
> inline comments.
> 1.    VM1 that is created on the local file system with a cinder volume 
> attached
> 2.    VM2 that is booted off from a cinder volume and has couple of cinder 
> volumes attached
> 3.    VM1 and VM2 all booted from cinder volumes and has couple of volumes 
> attached. They also share a private network for internal communication.
> 4.    
> In all these cases Raksha will take a consistent snap of VMs, walk thru each 
> VM resources and backup the resources to swift end point. 
> In case 1, that means backup VM image and Cinder volume image to swift
> In case 2 is an extension of case 1.
> In case 3, Raksha not only backup VM1 and VM2 and its associated resources, 
> it also backup the network configuration
> 
> Now lets consider the restore case. The restore operation walks thru the 
> backup resources and calls into respective openstack services to restore 
> those objects. In case1, it first calls Nova API to restore the VM, it calls 
> into Cinder to restore the volume and attach the volume to the newly restored 
> VM instance. In case of 3, it also calls into Neutron API to restore the 
> networking. Hence my argument is that not one OpenStack project has a global 
> view of VM and all its resources to implement an effective backup and restore 
> services.
> 
> 
>>> I wonder what is the added-value of a project approach versus enhancements
>>> to the current Nova and Cinder implementations of backup. Let me elaborate.
> 
>>> Nova has a "nova backup" feature that performs a backup of a VM to Glance,
>>> the backup is managed by tenants in the same way that you propose.
>>> While today it provides only point-in-time full backup, it seems reasonable
>>> that it can be extended support incremental and consistent backup as well -
>>> as the actual work is done either by the Storage or Hypervisor in any case.
> 
> Though Nova has API to upload a snapshot of the VM to glance, it does not 
> snapshot any volumes associated with the VM. When a snapshot is uploaded to 
> glance, Nova creates an image by collapsing the qemu image with delta file 
> and uploads the larger file to glance. If we were to perform periodic backups 
> of VMs, this is a very inefficient way to do backup. Also having to manage 
> two end points, one for Nova and Cinder is inefficient. These are the gaps I 
> called out in Raksha wiki page.
> 
> 
>>> Cinder has a cinder backup command that performs a volume backup to Swift,
>>> Ceph or TSM. The Ceph implementation also support incremental backup (Ceph
>>> to Ceph).
>>> I envision that Cinder could be expanded to support incremental backup (for
>>> persistent storage) by adding drivers/plug-ins that will leverage
>>> incremental backup features of either the storage or Hypervisors.
>>> Independently, in Havana the ability to do consistent volume snapshots was
>>> added to GlusterFS. I assume that this consistency support could be
>>> generalized to support other volume drivers, and be utilized as part of a
>>> backup code.
> 
> I think we are talking specific implementations here. Yes, I am aware of Ceph 
> blueprint to support incremental backup, but Cinder backup APIs are volume 
> specific. That means if a VM has multiple volumes mapped as in the case 2 I 
> discussed, tenant need to call backup api three times. Also if you look at 
> the swift layout of the cinder, it is very difficult to tie the swift images 
> back to a particular VM. Imagine a tenant were to restore a VM and all its 
> resources from a backup copy that was performed a week ago. The restore 
> operation is not straight forward.
> It is my understanding that consistency should be maintained at the VM, not 
> at individual volume. It is very difficult to assume how the application data 
> inside VM is laid out.
> 
>>> Looking at the key features in Raksha, it seems that the main features
>>> (2,3,4,7) could be addressed by improving the current mechanisms in Nova
>>> and Cinder. I didn't included 1 as a feature as it is more a statement of
>>> intent (or goal) than a feature.
>>> Features 5 (dedup) and 6 (scheduler) are indeed new in your proposal.
> 
>>> Looking at the source de-duplication feature, and taking Swift as an
>>> example, it seems reasonable that if Swift will implement de-duplication,
>>> then doing backup to Swift will give us de-duplication for free.
>>> In fact it would make sense to do the de-duplication at the Swift level
>>> instead of just the backup layer to gain more duplication opportunities.
> 
> I agree, however Swift is not the only object store that need to support 
> dedupe. Ceph is another popular object store too. GlusterFS supports Swift 
> end point and there are other commercially available object stores too. So 
> you argument  becomes very product specific. However source level dedupes is 
> different than dedupe at rest. Source level dedupe reduces the backup windows 
> and also reduces the amount of data that need to be pumped to backup end 
> point like swift.
> 
>>> Following the above, and assuming it all come true (at times I am known to
>>> be an optimistic), then we are left with backup job scheduling, and I
>>> wonder if that is enough for a new project.
> 
> I hope I convinced that Raksha has more to offer than a simple cron job. 
> Please take a look at the backup apis, its database schema and the usecases 
> it addresses in its wiki page.
> 
> 
> Bottom line is irrespective how OpenStack is deployed; here is how Raksha 
> workflow looks like
> * Create-backupjob VM1, VM2
>       --> Returns backup job id, id1
> * Run-backupjob id1
>       --> Returns runid rid1
> * Run backup job id1
>      --> Returns run id rid2
>    
> * Restore rid1
>       --> Restores PiT of VM1 and VM2 and its associated volumes
> 
> 
>>> My question is, would it make sense to add to the current mechanisms in
>>> Nova and Cinder than add the complexity of a new project?
> 
> I think the answer is yes  :)
> 
> Regards,
> Murali Balcha
>>> __________________________________________
>>> Ronen I. Kat
>>> Storage Research
> IBM Research - Haifa
> Phone: +972.3.7689493
> Email: ronen...@il.ibm.com
> 
> From:   Murali Balcha <murali.bal...@triliodata.com>
> To:     "openstack-dev@lists.openstack.org"
>            <openstack-dev@lists.openstack.org>,
>            "openst...@list.openstack.org" <openst...@list.openstack.org>,
> Date:   29/08/2013 01:18 AM
> Subject:        [openstack-dev] Proposal for Raksha, a Data Protection As a
>            Service project
> 
> 
> 
> Hello Stackers,
> We would like to introduce a new project Raksha, a Data Protection As a
> Service (DPaaS) for OpenStack Cloud.
> Raksha’s primary goal is to provide a comprehensive Data Protection for
> OpenStack by leveraging Nova, Swift, Glance and Cinder. Raksha has
> following key features:
>      1.       Provide an enterprise grade data protection for OpenStack
>      based clouds
>      2.       Tenant administered backups and restores
>      3.       Application consistent backups
>      4.       Point In Time(PiT) full and incremental backups and restores
>      5.       Dedupe at source for efficient backups
>      6.       A job scheduler for periodic backups
>      7.       Noninvasive backup solution that does not require service
>      interruption during backup window
> 
> You will find the rationale behind the need for Raksha in OpenStack in its
> Wiki. The wiki also has the preliminary design and the API description.
> Some of the Raksha functionality may overlap with Nova and Cinder projects
> and as a community lets work together to coordinate the features among
> these projects. We would like to seek out early feedback so we can address
> as many issues as we can in the first code drop. We are hoping to enlist
> the OpenStack community help in making Raksha a part of OpenStack.
> Raksha’s project resources:
> Wiki: https://wiki.openstack.org/wiki/Raksha
> Launchpad: https://launchpad.net/raksha
> Github: https://github.com/DPaaS-Raksha/Raksha (We will upload a prototype
> code in few days)
> If you want to talk to us, send an email to
> openstack-...@lists.launchpad.net with "[raksha]" in the subject or use
> #openstack-raksha irc channel.
> 
> Best Regards,
> Murali Balcha_______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] Proposal for Raksha, a Data Protection As a Service project

Reply via email to