On Fri, Apr 18, 2014 at 10:53 AM, lihuiba <magazine.lihu...@163.com> wrote: >>It's not 100% true, in my case at last. We fixed this problem by >>network interface driver, it causes kernel panic and readonly issues >>under heavy networking workload actually. > > Network traffic control could help. The point is to ensure no instance > is starved to death. Traffic control can be done with tc. >
btw, I see but at the moment we had fixed it by network interface device driver instead of workaround - to limit network traffic slow down. > > >>btw, we are doing some works to make Glance to integrate Cinder as a >>unified block storage > backend. > That sounds interesting. Is there some more materials? > There are few works done in Glance (https://blueprints.launchpad.net/glance/+spec/glance-cinder-driver ), but some work still need to be taken I'm sure. There are something on drafting, and some dependencies need to be resolved as well. > > > At 2014-04-18 06:05:23,"Zhi Yan Liu" <lzy....@gmail.com> wrote: >>Replied as inline comments. >> >>On Thu, Apr 17, 2014 at 9:33 PM, lihuiba <magazine.lihu...@163.com> wrote: >>>>IMO we'd better to use backend storage optimized approach to access >>>>remote image from compute node instead of using iSCSI only. And from >>>>my experience, I'm sure iSCSI is short of stability under heavy I/O >>>>workload in product environment, it could causes either VM filesystem >>>>to be marked as readonly or VM kernel panic. >>> >>> Yes, in this situation, the problem lies in the backend storage, so no >>> other >>> >>> protocol will perform better. However, P2P transferring will greatly >>> reduce >>> >>> workload on the backend storage, so as to increase responsiveness. >>> >> >>It's not 100% true, in my case at last. We fixed this problem by >>network interface driver, it causes kernel panic and readonly issues >>under heavy networking workload actually. >> >>> >>> >>>>As I said currently Nova already has image caching mechanism, so in >>>>this case P2P is just an approach could be used for downloading or >>>>preheating for image caching. >>> >>> Nova's image caching is file level, while VMThunder's is block-level. And >>> >>> VMThunder is for working in conjunction with Cinder, not Glance. >>> VMThunder >>> >>> currently uses facebook's flashcache to realize caching, and dm-cache, >>> >>> bcache are also options in the future. >>> >> >>Hm if you say bcache, dm-cache and flashcache, I'm just thinking if >>them could be leveraged by operation/best-practice level. >> >>btw, we are doing some works to make Glance to integrate Cinder as a >>unified block storage backend. >> >>> >>>>I think P2P transferring/pre-caching sounds a good way to go, as I >>>>mentioned as well, but actually for the area I'd like to see something >>>>like zero-copy + CoR. On one hand we can leverage the capability of >>>>on-demand downloading image bits by zero-copy approach, on the other >>>>hand we can prevent to reading data from remote image every time by >>>>CoR. >>> >>> Yes, on-demand transferring is what you mean by "zero-copy", and caching >>> is something close to CoR. In fact, we are working on a kernel module >>> called >>> foolcache that realize a true CoR. See >>> https://github.com/lihuiba/dm-foolcache. >>> >> >>Yup. And it's really interesting to me, will take a look, thanks for >> sharing. >> >>> >>> >>> >>> National Key Laboratory for Parallel and Distributed >>> Processing, College of Computer Science, National University of Defense >>> Technology, Changsha, Hunan Province, P.R. China >>> 410073 >>> >>> >>> At 2014-04-17 17:11:48,"Zhi Yan Liu" <lzy....@gmail.com> wrote: >>>>On Thu, Apr 17, 2014 at 4:41 PM, lihuiba <magazine.lihu...@163.com> >>>> wrote: >>>>>>IMHO, zero-copy approach is better >>>>> VMThunder's "on-demand transferring" is the same thing as your >>>>> "zero-copy >>>>> approach". >>>>> VMThunder is uses iSCSI as the transferring protocol, which is option >>>>> #b >>>>> of >>>>> yours. >>>>> >>>> >>>>IMO we'd better to use backend storage optimized approach to access >>>>remote image from compute node instead of using iSCSI only. And from >>>>my experience, I'm sure iSCSI is short of stability under heavy I/O >>>>workload in product environment, it could causes either VM filesystem >>>>to be marked as readonly or VM kernel panic. >>>> >>>>> >>>>>>Under #b approach, my former experience from our previous similar >>>>>>Cloud deployment (not OpenStack) was that: under 2 PC server storage >>>>>>nodes (general *local SAS disk*, without any storage backend) + >>>>>>2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500 >>>>>>VMs in a minute. >>>>> suppose booting one instance requires reading 300MB of data, so 500 >>>>> ones >>>>> require 150GB. Each of the storage server needs to send data at a rate >>>>> of >>>>> 150GB/2/60 = 1.25GB/s on average. This is absolutely a heavy burden >>>>> even >>>>> for high-end storage appliances. In production systems, this request >>>>> (booting >>>>> 500 VMs in one shot) will significantly disturb other running >>>>> instances >>>>> accessing the same storage nodes. >>>>> >> >>btw, I believe the case/numbers is not true as well, since remote >>image bits could be loaded on-demand instead of load them all on boot >>stage. >> >>zhiyan >> >>>>> VMThunder eliminates this problem by P2P transferring and >>>>> on-compute-node >>>>> caching. Even a pc server with one 1gb NIC (this is a true pc server!) >>>>> can >>>>> boot >>>>> 500 VMs in a minute with ease. For the first time, VMThunder makes bulk >>>>> provisioning of VMs practical for production cloud systems. This is the >>>>> essential >>>>> value of VMThunder. >>>>> >>>> >>>>As I said currently Nova already has image caching mechanism, so in >>>>this case P2P is just an approach could be used for downloading or >>>>preheating for image caching. >>>> >>>>I think P2P transferring/pre-caching sounds a good way to go, as I >>>>mentioned as well, but actually for the area I'd like to see something >>>>like zero-copy + CoR. On one hand we can leverage the capability of >>>>on-demand downloading image bits by zero-copy approach, on the other >>>>hand we can prevent to reading data from remote image every time by >>>>CoR. >>>> >>>>zhiyan >>>> >>>>> >>>>> >>>>> >>>>> =================================================== >>>>> From: Zhi Yan Liu <lzy....@gmail.com> >>>>> Date: 2014-04-17 0:02 GMT+08:00 >>>>> Subject: Re: [openstack-dev] [Nova][blueprint] Accelerate the booting >>>>> process of a number of vms via VMThunder >>>>> To: "OpenStack Development Mailing List (not for usage questions)" >>>>> <openstack-dev@lists.openstack.org> >>>>> >>>>> >>>>> >>>>> Hello Yongquan Fu, >>>>> >>>>> My thoughts: >>>>> >>>>> 1. Currently Nova has already supported image caching mechanism. It >>>>> could caches the image on compute host which VM had provisioning from >>>>> it before, and next provisioning (boot same image) doesn't need to >>>>> transfer it again only if cache-manger clear it up. >>>>> 2. P2P transferring and prefacing is something that still based on >>>>> copy mechanism, IMHO, zero-copy approach is better, even >>>>> transferring/prefacing could be optimized by such approach. (I have >>>>> not check "on-demand transferring" of VMThunder, but it is a kind of >>>>> transferring as well, at last from its literal meaning). >>>>> And btw, IMO, we have two ways can go follow zero-copy idea: >>>>> a. when Nova and Glance use same backend storage, we could use storage >>>>> special CoW/snapshot approach to prepare VM disk instead of >>>>> copy/transferring image bits (through HTTP/network or local copy). >>>>> b. without "unified" storage, we could attach volume/LUN to compute >>>>> node from backend storage as a base image, then do such CoW/snapshot >>>>> on it to prepare root/ephemeral disk of VM. This way just like >>>>> boot-from-volume but different is that we do CoW/snapshot on Nova side >>>>> instead of Cinder/storage side. >>>>> >>>>> For option #a, we have already got some progress: >>>>> https://blueprints.launchpad.net/nova/+spec/image-multiple-location >>>>> https://blueprints.launchpad.net/nova/+spec/rbd-clone-image-handler >>>>> https://blueprints.launchpad.net/nova/+spec/vmware-clone-image-handler >>>>> >>>>> Under #b approach, my former experience from our previous similar >>>>> Cloud deployment (not OpenStack) was that: under 2 PC server storage >>>>> nodes (general *local SAS disk*, without any storage backend) + >>>>> 2-way/multi-path iSCSI + 1G network bandwidth, we can provisioning 500 >>>>> VMs in a minute. >>>>> >>>>> For vmThunder topic I think it sounds a good idea, IMO P2P, prefacing >>>>> is one of optimized approach for image transferring valuably. >>>>> >>>>> zhiyan >>>>> >>>>> On Wed, Apr 16, 2014 at 9:14 PM, yongquan Fu <quanyo...@gmail.com> >>>>> wrote: >>>>>> >>>>>> Dear all, >>>>>> >>>>>> >>>>>> >>>>>> We would like to present an extension to the vm-booting functionality >>>>>> of >>>>>> Nova when a number of homogeneous vms need to be launched at the same >>>>>> time. >>>>>> >>>>>> >>>>>> >>>>>> The motivation for our work is to increase the speed of provisioning >>>>>> vms >>>>>> for >>>>>> large-scale scientific computing and big data processing. In that >>>>>> case, >>>>>> we >>>>>> often need to boot tens and hundreds virtual machine instances at the >>>>>> same >>>>>> time. >>>>>> >>>>>> >>>>>> Currently, under the Openstack, we found that creating a large >>>>>> number >>>>>> of >>>>>> virtual machine instances is very time-consuming. The reason is the >>>>>> booting >>>>>> procedure is a centralized operation that involve performance >>>>>> bottlenecks. >>>>>> Before a virtual machine can be actually started, OpenStack either >>>>>> copy >>>>>> the >>>>>> image file (swift) or attach the image volume (cinder) from storage >>>>>> server >>>>>> to compute node via network. Booting a single VM need to read a large >>>>>> amount >>>>>> of image data from the image storage server. So creating a large >>>>>> number >>>>>> of >>>>>> virtual machine instances would cause a significant workload on the >>>>>> servers. >>>>>> The servers become quite busy even unavailable during the deployment >>>>>> phase. >>>>>> It would consume a very long time before the whole virtual machine >>>>>> cluster >>>>>> useable. >>>>>> >>>>>> >>>>>> >>>>>> Our extension is based on our work on vmThunder, a novel mechanism >>>>>> accelerating the deployment of large number virtual machine instances. >>>>>> It >>>>>> is >>>>>> written in Python, can be integrated with OpenStack easily. VMThunder >>>>>> addresses the problem described above by following improvements: >>>>>> on-demand >>>>>> transferring (network attached storage), compute node caching, P2P >>>>>> transferring and prefetching. VMThunder is a scalable and >>>>>> cost-effective >>>>>> accelerator for bulk provisioning of virtual machines. >>>>>> >>>>>> >>>>>> >>>>>> We hope to receive your feedbacks. Any comments are extremely >>>>>> welcome. >>>>>> Thanks in advance. >>>>>> >>>>>> >>>>>> >>>>>> PS: >>>>>> >>>>>> >>>>>> >>>>>> VMThunder enhanced nova blueprint: >>>>>> https://blueprints.launchpad.net/nova/+spec/thunderboost >>>>>> >>>>>> VMThunder standalone project: https://launchpad.net/vmthunder; >>>>>> >>>>>> VMThunder prototype: https://github.com/lihuiba/VMThunder >>>>>> >>>>>> VMThunder etherpad: https://etherpad.openstack.org/p/vmThunder >>>>>> >>>>>> VMThunder portal: http://www.vmthunder.org/ >>>>>> >>>>>> VMThunder paper: >>>>>> http://www.computer.org/csdl/trans/td/preprint/06719385.pdf >>>>>> >>>>>> >>>>>> >>>>>> Regards >>>>>> >>>>>> >>>>>> >>>>>> vmThunder development group >>>>>> >>>>>> PDL >>>>>> >>>>>> National University of Defense Technology >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> OpenStack-dev mailing list >>>>>> OpenStack-dev@lists.openstack.org >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>>> >>>>> >>>>> _______________________________________________ >>>>> OpenStack-dev mailing list >>>>> OpenStack-dev@lists.openstack.org >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>> >>>>> >>>>> >>>>> -- >>>>> Yongquan Fu >>>>> PhD, Assistant Professor, >>>>> National Key Laboratory for Parallel and Distributed >>>>> Processing, College of Computer Science, National University of Defense >>>>> Technology, Changsha, Hunan Province, P.R. China >>>>> 410073 >>>>> >>>>> _______________________________________________ >>>>> OpenStack-dev mailing list >>>>> OpenStack-dev@lists.openstack.org >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>> >>>> >>>>_______________________________________________ >>>>OpenStack-dev mailing list >>>>OpenStack-dev@lists.openstack.org >>>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> >>_______________________________________________ >>OpenStack-dev mailing list >>OpenStack-dev@lists.openstack.org >>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev