Replied as inline comments. On Thu, Apr 17, 2014 at 9:33 PM, lihuiba <magazine.lihu...@163.com> wrote: >>IMO we'd better to use backend storage optimized approach to access >>remote image from compute node instead of using iSCSI only. And from >>my experience, I'm sure iSCSI is short of stability under heavy I/O >>workload in product environment, it could causes either VM filesystem >>to be marked as readonly or VM kernel panic. > > Yes, in this situation, the problem lies in the backend storage, so no other > > protocol will perform better. However, P2P transferring will greatly reduce > > workload on the backend storage, so as to increase responsiveness. >
It's not 100% true, in my case at last. We fixed this problem by network interface driver, it causes kernel panic and readonly issues under heavy networking workload actually. > > >>As I said currently Nova already has image caching mechanism, so in >>this case P2P is just an approach could be used for downloading or >>preheating for image caching. > > Nova's image caching is file level, while VMThunder's is block-level. And > > VMThunder is for working in conjunction with Cinder, not Glance. VMThunder > > currently uses facebook's flashcache to realize caching, and dm-cache, > > bcache are also options in the future. > Hm if you say bcache, dm-cache and flashcache, I'm just thinking if them could be leveraged by operation/best-practice level. btw, we are doing some works to make Glance to integrate Cinder as a unified block storage backend. > >>I think P2P transferring/pre-caching sounds a good way to go, as I >>mentioned as well, but actually for the area I'd like to see something >>like zero-copy + CoR. On one hand we can leverage the capability of >>on-demand downloading image bits by zero-copy approach, on the other >>hand we can prevent to reading data from remote image every time by >>CoR. > > Yes, on-demand transferring is what you mean by "zero-copy", and caching > is something close to CoR. In fact, we are working on a kernel module called > foolcache that realize a true CoR. See > https://github.com/lihuiba/dm-foolcache. > Yup. And it's really interesting to me, will take a look, thanks for sharing. > > > > National Key Laboratory for Parallel and Distributed > Processing, College of Computer Science, National University of Defense > Technology, Changsha, Hunan Province, P.R. China > 410073 > > > At 2014-04-17 17:11:48,"Zhi Yan Liu" <lzy....@gmail.com> wrote: >>On Thu, Apr 17, 2014 at 4:41 PM, lihuiba <magazine.lihu...@163.com> wrote: >>>>IMHO, zero-copy approach is better >>> VMThunder's "on-demand transferring" is the same thing as your "zero-copy >>> approach". >>> VMThunder is uses iSCSI as the transferring protocol, which is option #b >>> of >>> yours. >>> >> >>IMO we'd better to use backend storage optimized approach to access >>remote image from compute node instead of using iSCSI only. And from >>my experience, I'm sure iSCSI is short of stability under heavy I/O >>workload in product environment, it could causes either VM filesystem >>to be marked as readonly or VM kernel panic. >> >>> >>>>Under #b approach, my former experience from our previous similar >>>>Cloud deployment (not OpenStack) was that: under 2 PC server storage >>>>nodes (general *local SAS disk*, without any storage backend) + >>>>2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500 >>>>VMs in a minute. >>> suppose booting one instance requires reading 300MB of data, so 500 ones >>> require 150GB. Each of the storage server needs to send data at a rate >>> of >>> 150GB/2/60 = 1.25GB/s on average. This is absolutely a heavy burden even >>> for high-end storage appliances. In production systems, this request >>> (booting >>> 500 VMs in one shot) will significantly disturb other running instances >>> accessing the same storage nodes. >>> btw, I believe the case/numbers is not true as well, since remote image bits could be loaded on-demand instead of load them all on boot stage. zhiyan >>> VMThunder eliminates this problem by P2P transferring and on-compute-node >>> caching. Even a pc server with one 1gb NIC (this is a true pc server!) >>> can >>> boot >>> 500 VMs in a minute with ease. For the first time, VMThunder makes bulk >>> provisioning of VMs practical for production cloud systems. This is the >>> essential >>> value of VMThunder. >>> >> >>As I said currently Nova already has image caching mechanism, so in >>this case P2P is just an approach could be used for downloading or >>preheating for image caching. >> >>I think P2P transferring/pre-caching sounds a good way to go, as I >>mentioned as well, but actually for the area I'd like to see something >>like zero-copy + CoR. On one hand we can leverage the capability of >>on-demand downloading image bits by zero-copy approach, on the other >>hand we can prevent to reading data from remote image every time by >>CoR. >> >>zhiyan >> >>> >>> >>> >>> =================================================== >>> From: Zhi Yan Liu <lzy....@gmail.com> >>> Date: 2014-04-17 0:02 GMT+08:00 >>> Subject: Re: [openstack-dev] [Nova][blueprint] Accelerate the booting >>> process of a number of vms via VMThunder >>> To: "OpenStack Development Mailing List (not for usage questions)" >>> <openstack-dev@lists.openstack.org> >>> >>> >>> >>> Hello Yongquan Fu, >>> >>> My thoughts: >>> >>> 1. Currently Nova has already supported image caching mechanism. It >>> could caches the image on compute host which VM had provisioning from >>> it before, and next provisioning (boot same image) doesn't need to >>> transfer it again only if cache-manger clear it up. >>> 2. P2P transferring and prefacing is something that still based on >>> copy mechanism, IMHO, zero-copy approach is better, even >>> transferring/prefacing could be optimized by such approach. (I have >>> not check "on-demand transferring" of VMThunder, but it is a kind of >>> transferring as well, at last from its literal meaning). >>> And btw, IMO, we have two ways can go follow zero-copy idea: >>> a. when Nova and Glance use same backend storage, we could use storage >>> special CoW/snapshot approach to prepare VM disk instead of >>> copy/transferring image bits (through HTTP/network or local copy). >>> b. without "unified" storage, we could attach volume/LUN to compute >>> node from backend storage as a base image, then do such CoW/snapshot >>> on it to prepare root/ephemeral disk of VM. This way just like >>> boot-from-volume but different is that we do CoW/snapshot on Nova side >>> instead of Cinder/storage side. >>> >>> For option #a, we have already got some progress: >>> https://blueprints.launchpad.net/nova/+spec/image-multiple-location >>> https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler >>> https://blueprints.launchpad.net/nova/+spec/vmware-clone-image-handler >>> >>> Under #b approach, my former experience from our previous similar >>> Cloud deployment (not OpenStack) was that: under 2 PC server storage >>> nodes (general *local SAS disk*, without any storage backend) + >>> 2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500 >>> VMs in a minute. >>> >>> For vmThunder topic I think it sounds a good idea, IMO P2P, prefacing >>> is one of optimized approach for image transferring valuably. >>> >>> zhiyan >>> >>> On Wed, Apr 16, 2014 at 9:14 PM, yongquan Fu <quanyo...@gmail.com> wrote: >>>> >>>> Dear all, >>>> >>>> >>>> >>>> We would like to present an extension to the vm-booting functionality >>>> of >>>> Nova when a number of homogeneous vms need to be launched at the same >>>> time. >>>> >>>> >>>> >>>> The motivation for our work is to increase the speed of provisioning vms >>>> for >>>> large-scale scientific computing and big data processing. In that case, >>>> we >>>> often need to boot tens and hundreds virtual machine instances at the >>>> same >>>> time. >>>> >>>> >>>> Currently, under the Openstack, we found that creating a large >>>> number >>>> of >>>> virtual machine instances is very time-consuming. The reason is the >>>> booting >>>> procedure is a centralized operation that involve performance >>>> bottlenecks. >>>> Before a virtual machine can be actually started, OpenStack either copy >>>> the >>>> image file (swift) or attach the image volume (cinder) from storage >>>> server >>>> to compute node via network. Booting a single VM need to read a large >>>> amount >>>> of image data from the image storage server. So creating a large number >>>> of >>>> virtual machine instances would cause a significant workload on the >>>> servers. >>>> The servers become quite busy even unavailable during the deployment >>>> phase. >>>> It would consume a very long time before the whole virtual machine >>>> cluster >>>> useable. >>>> >>>> >>>> >>>> Our extension is based on our work on vmThunder, a novel mechanism >>>> accelerating the deployment of large number virtual machine instances. >>>> It >>>> is >>>> written in Python, can be integrated with OpenStack easily. VMThunder >>>> addresses the problem described above by following improvements: >>>> on-demand >>>> transferring (network attached storage), compute node caching, P2P >>>> transferring and prefetching. VMThunder is a scalable and cost-effective >>>> accelerator for bulk provisioning of virtual machines. >>>> >>>> >>>> >>>> We hope to receive your feedbacks. Any comments are extremely welcome. >>>> Thanks in advance. >>>> >>>> >>>> >>>> PS: >>>> >>>> >>>> >>>> VMThunder enhanced nova blueprint: >>>> https://blueprints.launchpad.net/nova/+spec/thunderboost >>>> >>>> VMThunder standalone project: https://launchpad.net/vmthunder; >>>> >>>> VMThunder prototype: https://github.com/lihuiba/VMThunder >>>> >>>> VMThunder etherpad: https://etherpad.openstack.org/p/vmThunder >>>> >>>> VMThunder portal: http://www.vmthunder.org/ >>>> >>>> VMThunder paper: >>>> http://www.computer.org/csdl/trans/td/preprint/06719385.pdf >>>> >>>> >>>> >>>> Regards >>>> >>>> >>>> >>>> vmThunder development group >>>> >>>> PDL >>>> >>>> National University of Defense Technology >>>> >>>> >>>> _______________________________________________ >>>> OpenStack-dev mailing list >>>> OpenStack-dev@lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> >>> >>> -- >>> Yongquan Fu >>> PhD, Assistant Professor, >>> National Key Laboratory for Parallel and Distributed >>> Processing, College of Computer Science, National University of Defense >>> Technology, Changsha, Hunan Province, P.R. China >>> 410073 >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> >>_______________________________________________ >>OpenStack-dev mailing list >>OpenStack-dev@lists.openstack.org >>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev