Re: [Openstack-operators] [openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point

Matt Riedemann Mon, 04 Jun 2018 08:57:11 -0700

+openstack-operators to see if others have the same use case


On 5/31/2018 5:14 PM, Moore, Curt wrote:

We recently upgraded from Liberty to Pike and looking ahead to the codein Queens, noticed the image download deprecation notice withinstructions to post here if this interface was in use. As such, I’dlike to explain our use case and see if there is a better way ofaccomplishing our goal or lobby for the "un-deprecation" of thisextension point.


Thanks for speaking up - this is much easier *before* code is removed.

As with many installations, we are using Ceph for both our Glance imagestore and VM instance disks. In a normal workflow when both Glance andlibvirt are configured to use Ceph, libvirt reacts to the direct_urlfield on the Glance image and performs an in-place clone of the RAW diskimage from the images pool into the vms pool all within Ceph. Thesnapshot creation process is very fast and is thinly provisioned as it’sa COW snapshot.
This underlying workflow itself works great, the issue is withperformance of the VM’s disk within Ceph, especially as the number ofnodes within the cluster grows. We have found, especially with WindowsVMs (largely as a result of I/O for the Windows pagefile), that theperformance of the Ceph cluster as a whole takes a very large hit inkeeping up with all of this I/O thrashing, especially when Windows isbooting. This is not the case with Linux VMs as they do not use swap asfrequently as do Windows nodes with their pagefiles. Windows can be runwithout a pagefile but that leads to other odditites within Windows.
I should also mention that in our case, the nodes themselves areephemeral and we do not care about live migration, etc., we just wantraw performance.
As an aside on our Ceph setup without getting into too many details, wehave very fast SSD based Ceph nodes for this pool (separate crush root,SSDs for both OSD and journals, 2 replicas), interconnected on the sameswitch backplane, each with bonded 10GB uplinks to the switch. Our Novanodes are within the same datacenter (also have bonded 10GB uplinks totheir switches) but are distributed across different switches. We couldmove the Nova nodes to the same switch as the Ceph nodes but that is alarger logistical challenge to rearrange many servers to make space.
Back to our use case, in order to isolate this heavy I/O, a subset ofour compute nodes have a local SSD and are set to use qcow2 imagesinstead of rbd so that libvirt will pull the image down from Glance intothe node’s local image cache and run the VM from the local SSD. Thisallows Windows VMs to boot and perform their initial cloudbase-initsetup/reboot within ~20 sec vs 4-5 min, regardless of overall Cephcluster load. Additionally, this prevents us from "wasting" IOPS andinstead keep them local to the Nova node, reclaiming the networkbandwidth and Ceph IOPS for use by Cinder volumes. This is essentiallythe use case outlined here in the "Do designate some non-Ceph computehosts with low-latency local storage" section:
https://ceph.com/planet/the-dos-and-donts-for-ceph-for-openstack/
The challenge is that transferring the Glance image transfer is_glacially slow_ when using the Glance HTTP API (~30 min for a 50GBWindows image (It’s Windows, it’s huge with all of the necessary toolsinstalled)). If libvirt can instead perform an RBD export on the imageusing the image download functionality, it is able to download the sameimage in ~30 sec. We have code that is performing the direct downloadfrom Glance over RBD and it works great in our use case which is verysimilar to the code in this older patch:
https://review.openstack.org/#/c/44321/

It looks like at the time this had general approval (i.e. it wasn'tconsidered crazy) but was blocked simply due to the Havana featurefreeze. That's good to know.

We could look at attaching an additional ephemeral disk to the instanceand have cloudbase-init use it as the pagefile but it appears that iflibvirt is using rbd for its images_type, _all_ disks must then comefrom Ceph, there is no way at present to allow the VM image to run fromCeph and have an ephemeral disk mapped in from node-local storage. Evenstill, this would have the effect of "wasting" Ceph IOPS for the VM diskitself which could be better used for other purposes.

When you mentioned the swap above I was thinking similar to this,attaching a swap device but as you've pointed out, all disks local tothe compute host are going to use the same image type backend, so youcan't have the root disk and swap/ephemeral disks using different imagebackends.

Based on what I have explained about our use case, is there abetter/different way to accomplish the same goal without using thedeprecated image download functionality? If not, can we work to"un-deprecate" the download extension point? Should I work to get thecode for this RBD download into the upstream repository?

I think you should propose your changes upstream with a blueprint, thedocs for the blueprint process are here:


https://docs.openstack.org/nova/latest/contributor/blueprints.html

Since it's not an API change, this might just be a specless blueprint,but you'd need to write up the blueprint and probably post the PoC codeto Gerrit and then bring it up during the "Open Discussion" section ofthe weekly nova meeting.

Once we can take a look at the code change, we can go from there onwhether or not to add that in-tree or go some alternative route.

Until that happens, I think we'll just say we won't remove thatdeprecated image download extension code, but that's not going to be anunlimited amount of time if you don't propose your changes upstream.

Is there going to be anything blocking or slowing you down on your endwith regard to contributing this change, like legal approval, licenseagreements, etc? If so, please be up front about that.


--

Thanks,

Matt

_______________________________________________
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Re: [Openstack-operators] [openstack-dev] [nova][glance] Deprecation of nova.image.download.modules extension point

Reply via email to