For example, Punith from CloudByte sent out an e-mail yesterday that was very similar to this thread, but he was wondering how to implement such a concept on his company's SAN technology.
On Mon, Feb 16, 2015 at 10:40 AM, Mike Tutkowski < mike.tutkow...@solidfire.com> wrote: > Yeah, I think it's a similar concept, though. > > You would want to take snapshots on Ceph (or some other backend system > that acts as primary storage) instead of copying data to secondary storage > and calling it a snapshot. > > For Ceph or any other backend system like that, the idea is to speed up > snapshots by not requiring CPU cycles on the front end or network bandwidth > to transfer the data. > > In that sense, this is a general-purpose CloudStack problem and it appears > you are intending on discussing only the Ceph implementation here, which is > fine. > > On Mon, Feb 16, 2015 at 10:34 AM, Logan Barfield <lbarfi...@tqhosting.com> > wrote: > >> Hi Mike, >> >> I think the interest in this issue is primarily for Ceph RBD, which >> doesn't use iSCSI or SAN concepts in general. As well I believe RBD >> is only currently supported in KVM (and VMware?). QEMU has native RBD >> support, so it attaches the devices directly to the VMs in question. >> It also natively supports snapshotting, which is what this discussion >> is about. >> >> Thank You, >> >> Logan Barfield >> Tranquil Hosting >> >> >> On Mon, Feb 16, 2015 at 11:46 AM, Mike Tutkowski >> <mike.tutkow...@solidfire.com> wrote: >> > I should have also commented on KVM (since that was the hypervisor >> called >> > out in the initial e-mail). >> > >> > In my situation, most of my customers use XenServer and/or ESXi, so KVM >> has >> > received the fewest of my cycles with regards to those three >> hypervisors. >> > >> > KVM, though, is actually the simplest hypervisor for which to implement >> > these changes (since I am using the iSCSI adapter of the KVM agent and >> it >> > just essentially passes my LUN to the VM in question). >> > >> > For KVM, there is no clustered file system applied to my backend LUN, >> so I >> > don't have to "worry" about that layer. >> > >> > I don't see any hurdles like *immutable* UUIDs of SRs and VDIs (such is >> the >> > case with XenServer) or having to re-signature anything (such is the >> case >> > with ESXi). >> > >> > On Mon, Feb 16, 2015 at 9:33 AM, Mike Tutkowski < >> > mike.tutkow...@solidfire.com> wrote: >> > >> >> I have been working on this on and off for a while now (as time >> permits). >> >> >> >> Here is an e-mail I sent to a customer of ours that helps describe >> some of >> >> the issues: >> >> >> >> *** Beginning of e-mail *** >> >> >> >> The main requests were around the following features: >> >> >> >> * The ability to leverage SolidFire snapshots. >> >> >> >> * The ability to create CloudStack templates from SolidFire snapshots. >> >> >> >> I had these on my roadmap, but bumped the priority up and began work on >> >> them for the CS 4.6 release. >> >> >> >> During design, I realized there were issues with the way XenServer is >> >> architected that prevented me from directly using SolidFire snapshots. >> >> >> >> I could definitely take a SolidFire snapshot of a SolidFire volume, but >> >> this snapshot would not be usable from XenServer if the original >> volume was >> >> still in use. >> >> >> >> Here is the gist of the problem: >> >> >> >> When XenServer leverages an iSCSI target such as a SolidFire volume, it >> >> applies a clustered files system to it, which they call a storage >> >> repository (SR). An SR has an *immutable* UUID associated with it. >> >> >> >> The virtual volume (which a VM sees as a disk) is represented by a >> virtual >> >> disk image (VDI) in the SR. A VDI also has an *immutable* UUID >> associated >> >> with it. >> >> >> >> If I take a snapshot (or a clone) of the SolidFire volume and then >> later >> >> try to use that snapshot from XenServer, XenServer complains that the >> SR on >> >> the snapshot has a UUID that conflicts with an existing UUID. >> >> >> >> In other words, it is not possible to use the original SR and the >> snapshot >> >> of this SR from XenServer at the same time, which is critical in a >> cloud >> >> environment (to enable creating templates from snapshots). >> >> >> >> The way I have proposed circumventing this issue is not ideal, but >> >> technically works (this code is checked into the CS 4.6 branch): >> >> >> >> When the time comes to take a CloudStack snapshot of a CloudStack >> volume >> >> that is backed by SolidFire storage via the storage plug-in, the >> plug-in >> >> will create a new SolidFire volume with characteristics (size and IOPS) >> >> equal to those of the original volume. >> >> >> >> We then have XenServer attach to this new SolidFire volume, create a >> *new* >> >> SR on it, and then copy the VDI from the source SR to the destination >> SR >> >> (the new SR). >> >> >> >> This leads to us having a copy of the VDI (a "snapshot" of sorts), but >> it >> >> requires CPU cycles on the compute cluster as well as network >> bandwidth to >> >> write to the SAN (thus it is slower and more resource intensive than a >> >> SolidFire snapshot). >> >> >> >> I spoke with Tim Mackey (who works on XenServer at Citrix) concerning >> this >> >> issue before and during the CloudStack Collaboration Conference in >> Budapest >> >> in November. He agreed that this is a legitimate issue with the way >> >> XenServer is designed and could not think of a way (other than what I >> was >> >> doing) to get around it in current versions of XenServer. >> >> >> >> One thought is to have a feature added to XenServer that enables you to >> >> change the UUID of an SR and of a VDI. >> >> >> >> If I could do that, then I could take a SolidFire snapshot of the >> >> SolidFire volume and issue commands to XenServer to have it change the >> >> UUIDs of the original SR and the original VDI. I could then recored the >> >> necessary UUID info in the CS DB. >> >> >> >> *** End of e-mail *** >> >> >> >> I have since investigated this on ESXi. >> >> >> >> ESXi does have a way for us to "re-signature" a datastore, so backend >> >> snapshots can be taken and effectively used on this hypervisor. >> >> >> >> On Mon, Feb 16, 2015 at 8:19 AM, Logan Barfield < >> lbarfi...@tqhosting.com> >> >> wrote: >> >> >> >>> I'm just going to stick with the qemu-img option change for RBD for >> >>> now (which should cut snapshot time down drastically), and look >> >>> forward to this in the future. I'd be happy to help get this moving, >> >>> but I'm not enough of a developer to lead the charge. >> >>> >> >>> As far as renaming goes, I agree that maybe backups isn't the right >> >>> word. That being said calling a full-sized copy of a volume a >> >>> "snapshot" also isn't the right word. Maybe "image" would be better? >> >>> >> >>> I've also got my reservations about "accounts" vs "users" (I think >> >>> "departments" and "accounts or users" respectively is less confusing), >> >>> but that's a different thread. >> >>> >> >>> Thank You, >> >>> >> >>> Logan Barfield >> >>> Tranquil Hosting >> >>> >> >>> >> >>> On Mon, Feb 16, 2015 at 10:04 AM, Wido den Hollander <w...@widodh.nl> >> >>> wrote: >> >>> > >> >>> > >> >>> > On 16-02-15 15:38, Logan Barfield wrote: >> >>> >> I like this idea a lot for Ceph RBD. I do think there should >> still be >> >>> >> support for copying snapshots to secondary storage as needed (for >> >>> >> transfers between zones, etc.). I really think that this could be >> >>> >> part of a larger move to clarify the naming conventions used for >> disk >> >>> >> operations. Currently "Volume Snapshots" should probably really be >> >>> >> called "Backups". So having "snapshot" functionality, and a >> "convert >> >>> >> snapshot to backup/template" would be a good move. >> >>> >> >> >>> > >> >>> > I fully agree that this would be a very great addition. >> >>> > >> >>> > I won't be able to work on this any time soon though. >> >>> > >> >>> > Wido >> >>> > >> >>> >> Thank You, >> >>> >> >> >>> >> Logan Barfield >> >>> >> Tranquil Hosting >> >>> >> >> >>> >> >> >>> >> On Mon, Feb 16, 2015 at 9:16 AM, Andrija Panic < >> >>> andrija.pa...@gmail.com> wrote: >> >>> >>> BIG +1 >> >>> >>> >> >>> >>> My team should submit some patch to ACS for better KVM snapshots, >> >>> including >> >>> >>> whole VM snapshot etc...but it's too early to give details... >> >>> >>> best >> >>> >>> >> >>> >>> On 16 February 2015 at 13:01, Andrei Mikhailovsky < >> and...@arhont.com> >> >>> wrote: >> >>> >>> >> >>> >>>> Hello guys, >> >>> >>>> >> >>> >>>> I was hoping to have some feedback from the community on the >> subject >> >>> of >> >>> >>>> having an ability to keep snapshots on the primary storage where >> it >> >>> is >> >>> >>>> supported by the storage backend. >> >>> >>>> >> >>> >>>> The idea behind this functionality is to improve how snapshots >> are >> >>> >>>> currently handled on KVM hypervisors with Ceph primary storage. >> At >> >>> the >> >>> >>>> moment, the snapshots are taken on the primary storage and being >> >>> copied to >> >>> >>>> the secondary storage. This method is very slow and inefficient >> even >> >>> on >> >>> >>>> small infrastructure. Even on medium deployments using snapshots >> in >> >>> KVM >> >>> >>>> becomes nearly impossible. If you have tens or hundreds >> concurrent >> >>> >>>> snapshots taking place you will have a bunch of timeouts and >> errors, >> >>> your >> >>> >>>> network becomes clogged, etc. In addition, using these snapshots >> for >> >>> >>>> creating new volumes or reverting back vms also slow and >> >>> inefficient. As >> >>> >>>> above, when you have tens or hundreds concurrent operations it >> will >> >>> not >> >>> >>>> succeed and you will have a majority of tasks with errors or >> >>> timeouts. >> >>> >>>> >> >>> >>>> At the moment, taking a single snapshot of relatively small >> volumes >> >>> (200GB >> >>> >>>> or 500GB for instance) takes tens if not hundreds of minutes. >> Taking >> >>> a >> >>> >>>> snapshot of the same volume on ceph primary storage takes a few >> >>> seconds at >> >>> >>>> most! Similarly, converting a snapshot to a volume takes tens if >> not >> >>> >>>> hundreds of minutes when secondary storage is involved; compared >> with >> >>> >>>> seconds if done directly on the primary storage. >> >>> >>>> >> >>> >>>> I suggest that the CloudStack should have the ability to keep >> volume >> >>> >>>> snapshots on the primary storage where this is supported by the >> >>> storage. >> >>> >>>> Perhaps having a per primary storage setting that enables this >> >>> >>>> functionality. This will be beneficial for Ceph primary storage >> on >> >>> KVM >> >>> >>>> hypervisors and perhaps on XenServer when Ceph will be supported >> in >> >>> a near >> >>> >>>> future. >> >>> >>>> >> >>> >>>> This will greatly speed up the process of using snapshots on KVM >> and >> >>> users >> >>> >>>> will actually start using snapshotting rather than giving up with >> >>> >>>> frustration. >> >>> >>>> >> >>> >>>> I have opened the ticket CLOUDSTACK-8256, so please cast your >> vote >> >>> if you >> >>> >>>> are in agreement. >> >>> >>>> >> >>> >>>> Thanks for your input >> >>> >>>> >> >>> >>>> Andrei >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>>> >> >>> >>> >> >>> >>> >> >>> >>> -- >> >>> >>> >> >>> >>> Andrija Panić >> >>> >> >> >> >> >> >> >> >> -- >> >> *Mike Tutkowski* >> >> *Senior CloudStack Developer, SolidFire Inc.* >> >> e: mike.tutkow...@solidfire.com >> >> o: 303.746.7302 >> >> Advancing the way the world uses the cloud >> >> <http://solidfire.com/solution/overview/?video=play>*™* >> >> >> > >> > >> > >> > -- >> > *Mike Tutkowski* >> > *Senior CloudStack Developer, SolidFire Inc.* >> > e: mike.tutkow...@solidfire.com >> > o: 303.746.7302 >> > Advancing the way the world uses the cloud >> > <http://solidfire.com/solution/overview/?video=play>*™* >> > > > > -- > *Mike Tutkowski* > *Senior CloudStack Developer, SolidFire Inc.* > e: mike.tutkow...@solidfire.com > o: 303.746.7302 > Advancing the way the world uses the cloud > <http://solidfire.com/solution/overview/?video=play>*™* > -- *Mike Tutkowski* *Senior CloudStack Developer, SolidFire Inc.* e: mike.tutkow...@solidfire.com o: 303.746.7302 Advancing the way the world uses the cloud <http://solidfire.com/solution/overview/?video=play>*™*