On Fri, Aug 30, 2013 at 8:02 AM, Murali Balcha <murali.bal...@triliodata.com > wrote:
> Hi John, > Thanks for your comments. I am planning to attend summit we can have a > wider discussion there. > > Thanks, > Murali Balcha > > > On Aug 30, 2013, at 12:05 AM, "John Griffith" <john.griff...@solidfire.com> > wrote: > > > > > On Thu, Aug 29, 2013 at 6:36 PM, Murali Balcha < > murali.bal...@triliodata.com> wrote: > >> >> >>> My question is, would it make sense to add to the current mechanisms >> in >> >>> Nova and Cinder than add the complexity of a new project? >> > >> > I think the answer is yes :) >> >> >> I meant there is a clear need for Raksha project. :) >> >> Thanks, >> Murali Balcha >> >> On Aug 29, 2013, at 7:45 PM, "Murali Balcha" < >> murali.bal...@triliodata.com> wrote: >> >> > >> > ________________________________________ >> >>> From: Ronen Kat <ronen...@il.ibm.com> >> >>> Sen: Thursday, August 29, 2013 2:55 PM >> >>> To: openstack-dev@lists.openstack.org; >> openstack-...@lists.launchpad.net >> >>> Subject: Re: [openstack-dev] Proposal for Raksha, a Data Protection >> As a Service project >> > >> >>> Hi Murali, >> > >> >>> I think the idea to provide enhanced data protection in OpenStack is a >> >>> great idea, and I have been thinking about backup in OpenStack for a >> while >> >>> now. >> >>> I just not sure a new project is the only way to do. >> > >> >>> (as disclosure, I contributed code to enable IBM TSM as a Cinder >> backup >> >>> driver) >> > >> > Hi Kat, >> > Consider the following use cases that Raksha will addresses. I will >> discuss from simple to complex use case and then address your specific >> questions with inline comments. >> > 1. VM1 that is created on the local file system with a cinder volume >> attached >> > 2. VM2 that is booted off from a cinder volume and has couple of >> cinder volumes attached >> > 3. VM1 and VM2 all booted from cinder volumes and has couple of >> volumes attached. They also share a private network for internal >> communication. >> > 4. >> > In all these cases Raksha will take a consistent snap of VMs, walk thru >> each VM resources and backup the resources to swift end point. >> > In case 1, that means backup VM image and Cinder volume image to swift >> > In case 2 is an extension of case 1. >> > In case 3, Raksha not only backup VM1 and VM2 and its associated >> resources, it also backup the network configuration >> > >> > Now lets consider the restore case. The restore operation walks thru >> the backup resources and calls into respective openstack services to >> restore those objects. In case1, it first calls Nova API to restore the VM, >> it calls into Cinder to restore the volume and attach the volume to the >> newly restored VM instance. In case of 3, it also calls into Neutron API to >> restore the networking. Hence my argument is that not one OpenStack project >> has a global view of VM and all its resources to implement an effective >> backup and restore services. >> > >> > >> >>> I wonder what is the added-value of a project approach versus >> enhancements >> >>> to the current Nova and Cinder implementations of backup. Let me >> elaborate. >> > >> >>> Nova has a "nova backup" feature that performs a backup of a VM to >> Glance, >> >>> the backup is managed by tenants in the same way that you propose. >> >>> While today it provides only point-in-time full backup, it seems >> reasonable >> >>> that it can be extended support incremental and consistent backup as >> well - >> >>> as the actual work is done either by the Storage or Hypervisor in any >> case. >> > >> > Though Nova has API to upload a snapshot of the VM to glance, it does >> not snapshot any volumes associated with the VM. When a snapshot is >> uploaded to glance, Nova creates an image by collapsing the qemu image with >> delta file and uploads the larger file to glance. If we were to perform >> periodic backups of VMs, this is a very inefficient way to do backup. Also >> having to manage two end points, one for Nova and Cinder is inefficient. >> These are the gaps I called out in Raksha wiki page. >> > >> > >> >>> Cinder has a cinder backup command that performs a volume backup to >> Swift, >> >>> Ceph or TSM. The Ceph implementation also support incremental backup >> (Ceph >> >>> to Ceph). >> >>> I envision that Cinder could be expanded to support incremental >> backup (for >> >>> persistent storage) by adding drivers/plug-ins that will leverage >> >>> incremental backup features of either the storage or Hypervisors. >> >>> Independently, in Havana the ability to do consistent volume >> snapshots was >> >>> added to GlusterFS. I assume that this consistency support could be >> >>> generalized to support other volume drivers, and be utilized as part >> of a >> >>> backup code. >> > >> > I think we are talking specific implementations here. Yes, I am aware >> of Ceph blueprint to support incremental backup, but Cinder backup APIs are >> volume specific. That means if a VM has multiple volumes mapped as in the >> case 2 I discussed, tenant need to call backup api three times. Also if you >> look at the swift layout of the cinder, it is very difficult to tie the >> swift images back to a particular VM. Imagine a tenant were to restore a VM >> and all its resources from a backup copy that was performed a week ago. The >> restore operation is not straight forward. >> > It is my understanding that consistency should be maintained at the VM, >> not at individual volume. It is very difficult to assume how the >> application data inside VM is laid out. >> > >> >>> Looking at the key features in Raksha, it seems that the main features >> >>> (2,3,4,7) could be addressed by improving the current mechanisms in >> Nova >> >>> and Cinder. I didn't included 1 as a feature as it is more a >> statement of >> >>> intent (or goal) than a feature. >> >>> Features 5 (dedup) and 6 (scheduler) are indeed new in your proposal. >> > >> >>> Looking at the source de-duplication feature, and taking Swift as an >> >>> example, it seems reasonable that if Swift will implement >> de-duplication, >> >>> then doing backup to Swift will give us de-duplication for free. >> >>> In fact it would make sense to do the de-duplication at the Swift >> level >> >>> instead of just the backup layer to gain more duplication >> opportunities. >> > >> > I agree, however Swift is not the only object store that need to >> support dedupe. Ceph is another popular object store too. GlusterFS >> supports Swift end point and there are other commercially available object >> stores too. So you argument becomes very product specific. However source >> level dedupes is different than dedupe at rest. Source level dedupe reduces >> the backup windows and also reduces the amount of data that need to be >> pumped to backup end point like swift. >> > >> >>> Following the above, and assuming it all come true (at times I am >> known to >> >>> be an optimistic), then we are left with backup job scheduling, and I >> >>> wonder if that is enough for a new project. >> > >> > I hope I convinced that Raksha has more to offer than a simple cron >> job. Please take a look at the backup apis, its database schema and the >> usecases it addresses in its wiki page. >> > >> > >> > Bottom line is irrespective how OpenStack is deployed; here is how >> Raksha workflow looks like >> > * Create-backupjob VM1, VM2 >> > --> Returns backup job id, id1 >> > * Run-backupjob id1 >> > --> Returns runid rid1 >> > * Run backup job id1 >> > --> Returns run id rid2 >> > >> > * Restore rid1 >> > --> Restores PiT of VM1 and VM2 and its associated volumes >> > >> > >> >>> My question is, would it make sense to add to the current mechanisms >> in >> >>> Nova and Cinder than add the complexity of a new project? >> > >> > I think the answer is yes :) >> > >> > Regards, >> > Murali Balcha >> >>> __________________________________________ >> >>> Ronen I. Kat >> >>> Storage Research >> > IBM Research - Haifa >> > Phone: +972.3.7689493 >> > Email: ronen...@il.ibm.com >> > >> > From: Murali Balcha <murali.bal...@triliodata.com> >> > To: "openstack-dev@lists.openstack.org" >> > <openstack-dev@lists.openstack.org>, >> > "openst...@list.openstack.org" <openst...@list.openstack.org >> >, >> > Date: 29/08/2013 01:18 AM >> > Subject: [openstack-dev] Proposal for Raksha, a Data Protection >> As a >> > Service project >> > >> > >> > >> > Hello Stackers, >> > We would like to introduce a new project Raksha, a Data Protection As a >> > Service (DPaaS) for OpenStack Cloud. >> > Raksha’s primary goal is to provide a comprehensive Data Protection for >> > OpenStack by leveraging Nova, Swift, Glance and Cinder. Raksha has >> > following key features: >> > 1. Provide an enterprise grade data protection for OpenStack >> > based clouds >> > 2. Tenant administered backups and restores >> > 3. Application consistent backups >> > 4. Point In Time(PiT) full and incremental backups and >> restores >> > 5. Dedupe at source for efficient backups >> > 6. A job scheduler for periodic backups >> > 7. Noninvasive backup solution that does not require service >> > interruption during backup window >> > >> > You will find the rationale behind the need for Raksha in OpenStack in >> its >> > Wiki. The wiki also has the preliminary design and the API description. >> > Some of the Raksha functionality may overlap with Nova and Cinder >> projects >> > and as a community lets work together to coordinate the features among >> > these projects. We would like to seek out early feedback so we can >> address >> > as many issues as we can in the first code drop. We are hoping to enlist >> > the OpenStack community help in making Raksha a part of OpenStack. >> > Raksha’s project resources: >> > Wiki: https://wiki.openstack.org/wiki/Raksha >> > Launchpad: https://launchpad.net/raksha >> > Github: https://github.com/DPaaS-Raksha/Raksha (We will upload a >> prototype >> > code in few days) >> > If you want to talk to us, send an email to >> > openstack-...@lists.launchpad.net with "[raksha]" in the subject or use >> > #openstack-raksha irc channel. >> > >> > Best Regards, >> > Murali Balcha_______________________________________________ >> > OpenStack-dev mailing list >> > OpenStack-dev@lists.openstack.org >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > _______________________________________________ >> > OpenStack-dev mailing list >> > OpenStack-dev@lists.openstack.org >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > >> > _______________________________________________ >> > OpenStack-dev mailing list >> > OpenStack-dev@lists.openstack.org >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > Hi Murali, > > This sounds pretty neat, but in my opinion it seems that we have most of > the items in your list covered with the Cinder backup service. As far as > backing up instances, I'm personally not sure about backing up ephemeral > objects? We already have the ability to create and upload an image which > is "kind of a backup". Also if you want a persistent instance wouldn't it > be better to have it reside on persistent storage and back that up? > > Anyway, my personal thought is that it might be more efficient to see > how things develop with the backup service in Cinder. > > As far as the deduplication idea, I really think that's much better done > on the target rather than trying to process it on the source. processing > this in the backup service is pretty expensive and there are a lot of > trouble spots, not the least of which is a pretty big hit on performance. > > Also as was pointed out, there are quite a few efficiencies and > optimizations that can be realized by leaving the work closer to the > backend storage itself. There are a number of cases already pointed out > where there are some good optimizations, in addition there are also a > number of back-ends in Cinder already that have plans for further > enhancements/optimizations as well. > > Anyway, that's just my opinion. I'd be really interested in talking > with you more (maybe at the summit) regarding some of the work you're doing > and some of the ideas that you have. It would be interesting to see what > we could do to improve the backup service already in Cinder. > > Thanks, > John > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > Great, I think this will be a good summit topic for sure.
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev