I thought about this a bit yesterday after Chiradeep talked to me. The first fix is definitely allow multiple local storage per host. That requires some work on cloudstack but I don't see it as a big problem.
Then a storage-pool allocator can be written such that it always allocates separate local storage pools to vms on the same host. That should be minimal work and can be taken as a side project. --Alex > -----Original Message----- > From: Chiradeep Vittal [mailto:chiradeep.vit...@citrix.com] > Sent: Tuesday, June 11, 2013 1:07 PM > To: dev@cloudstack.apache.org > Subject: Re: Hadoop cluster running in cloudstack > > Taking it to dev@ to see if there is any interest. > > > It is a good and interesting requirement. I can see hacking 'pre-setup' > storage with tags to achieve this, but it is going to be a fragile hack. > I believe GCE also has the concept of some instance types having dedicated > spindles. > > > On 6/6/13 11:14 AM, "David Ortiz" <dpor...@outlook.com> wrote: > > >Chiradeep, > > Currently I am working with KVM hypervisor nodes. The use case of > >having 4 spindles and assigning one to each node is exactly what I > >would like to do. For the moment I have all four spindles configured > >in a RAID with the cloudstack local storage pointed at it. > >Shanker, > > I had not seen that slideshow yet, so thank you for pointing me > >to it. As of now, the hadoop resources I am using are statically > >allocated between 4 hosts. As it stands now, I am constrained to those > >resources without the ability to add any additional storage cluster (or > >additional storage to my current shared storage appliance), or additional > nodes. > >Fortunately, my use cases don't require any kind of reallocation of the > >hadoop nodes. It's more clients for the cluster as well as web service > >nodes that run clients that are being dynamically spun up and down. I > >have found that I can get through my jobs alright, they just take a lot > >of extra time to run since I have the storage acting as a bottleneck > >right now. > >Thanks, David Ortiz > > > >> From: run...@gmail.com > >> Subject: Re: Hadoop cluster running in cloudstack > >> Date: Thu, 6 Jun 2013 10:23:50 -0400 > >> To: us...@cloudstack.apache.org > >> > >> > >> On Jun 6, 2013, at 4:05 AM, Shanker Balan > >><shanker.ba...@shapeblue.com> > >>wrote: > >> > >> > On 05-Jun-2013, at 12:13 AM, David Ortiz <dpor...@outlook.com> wrote: > >> > > >> >> Hello, > >> >> Has anyone tried running a hadoop cluster in a cloudstack > >>environment? I have set one up, but I am finding that I am having > >>some IO contention between slave nodes on each host since they all > >>share one local storage pool. As I understand it, there is not > >>currently a method for using multiple local storage pools with VMs > >>through cloudstack. Has anyone found a workaround for this by any > chance? > >> > > >> > > >> > Hi David, > >> > > >> > Have you seen Seb's > >>http://www.slideshare.net/sebastiengoasguen/cloudstack-and-bigdata > >>slides yet? > >> > >> As a quick disclaimer, the various configurations I highlight in this > >>deck are a bit hand wavy and I did not test them. I just made a guess > >>about how one might want to use the baremetal functionality in > >>cloudstack. The main distinction being between using a "big data" > >>store as storage backends of cloudstack and using cloudstack to > >>provision a bigdata store on-demand. > >> > >> -sebastien > >> > >> > > >> > In my experience running Hadoop (100+ nodes) on traditional > >> > servers, > >>its going to be really hard to scale up Hadoop workloads using local > >>storage and HDFS on a cloud. > >> > > >> > I ran out of IOPS very quickly. There was enough CPU headroom but > >>could not add more slots as disk became the bottleneck. Every time > >>there was a node/disk failure, rebalancing was a nightmare with a 3x > >>HDFS replication factor. > >> > > >> > If I were to run Hadoop on an IaaS cloud, I would do it very > >> > similar > >>to Amazon AWS EMR - instances backed by a "Storage As A Service" layer > >>(S3) for big data instead of HDFS. > >> > > >> > The system would work as below: > >> > > >> > - Create a dedicated big data storage tier using a distributed > >>filesystem like Gluster/Ceph/Isilon. Most of the vendors now provide > >>S3 compat connectors for Hadoop. > >> > > >> > http://ceph.com/docs/master/cephfs/hadoop/ > >> > http://gluster.org/community/documentation/index.php/Hadoop > >> > http://www.emc.com/big-data/scale-out-storage-hadoop.htm > >> > > >> > - Hadoop instances are spun up on bare metal or on hypervisors. The > >>service offerings for "big data" instances could will run on dedicated > >>hypervisors (via tags) with high bandwidth network connectivity to the > >>storage service. > >> > > >> > - Hadoop instances use Local storage for run time data. > >> > > >> > - Hadoop VMs connect to the storage tier via connectors for > >> > permanent > >>storage > >> > > >> > Benefits: > >> > > >> > - Spinning up/down VMs don't cause HDFS rebalancing as there is no > >>HDFS anywhere. > >> > > >> > - Scale out VMs independently of storage. Add more spindles / nodes > >>to the storage cluster to scale out IOPS and capacity > >> > > >> > - Easy upgrade of Hadoop releases without risk to data > >> > > >> > Regards. > >> > @shankerbalan > >> > > >> > -- > >> > Shanker Balan > >> > Managing Consultant > >> > > >> > > >> > > >> > M: +91 98860 60539 > >> > shanker.ba...@shapeblue.com | www.shapeblue.com | > >> > Twitter:@shapeblue ShapeBlue India, 22nd floor, Unit 2201A, World > >> > Trade Centre, > >>Bangalore - 560 055 > >> > > >> > This email and any attachments to it may be confidential and are > >>intended solely for the use of the individual to whom it is addressed. > >>Any views or opinions expressed are solely those of the author and do > >>not necessarily represent those of Shape Blue Ltd or related companies. > >>If you are not the intended recipient of this email, you must neither > >>take any action based upon its contents, nor copy or show it to anyone. > >>Please contact the sender if you believe you have received this email > >>in error. Shape Blue Ltd is a company incorporated in England & Wales. > >>ShapeBlue Services India LLP is operated under license from Shape Blue > >>Ltd. ShapeBlue is a registered trademark. > >> > >