RE: Hadoop cluster running in cloudstack

Alex Huang Wed, 12 Jun 2013 09:11:23 -0700

I thought about this a bit yesterday after Chiradeep talked to me.

The first fix is definitely allow multiple local storage per host.  That 
requires some work on cloudstack but I don't see it as a big problem.


Then a storage-pool allocator can be written such that it always allocates 
separate local storage pools to vms on the same host.  That should be minimal 
work and can be taken as a side project.

--Alex

> -----Original Message-----
> From: Chiradeep Vittal [mailto:chiradeep.vit...@citrix.com]
> Sent: Tuesday, June 11, 2013 1:07 PM
> To: dev@cloudstack.apache.org
> Subject: Re: Hadoop cluster running in cloudstack
> 
> Taking it to dev@ to see if there is any interest.
> 
> 
> It is a good and interesting requirement. I can see hacking 'pre-setup'
> storage with tags to achieve this, but it is going to be a fragile hack.
> I believe GCE also has the concept of some instance types having dedicated
> spindles.
> 
> 
> On 6/6/13 11:14 AM, "David Ortiz" <dpor...@outlook.com> wrote:
> 
> >Chiradeep,
> >     Currently I am working with KVM hypervisor nodes.  The use case of
> >having 4 spindles and assigning one to each node is exactly what I
> >would like to do.  For the moment I have all four spindles configured
> >in a RAID with the cloudstack local storage pointed at it.
> >Shanker,
> >      I had not seen that slideshow yet, so thank you for pointing me
> >to it.  As of now, the hadoop resources I am using are statically
> >allocated between 4 hosts.  As it stands now, I am constrained to those
> >resources without the ability to add any additional storage cluster (or
> >additional storage to my current shared storage appliance), or additional
> nodes.
> >Fortunately, my use cases don't require any kind of reallocation of the
> >hadoop nodes.  It's more clients for the cluster as well as web service
> >nodes that run clients that are being dynamically spun up and down.  I
> >have found that I can get through my jobs alright, they just take a lot
> >of extra time to run since I have the storage acting as a bottleneck
> >right now.
> >Thanks,     David Ortiz
> >
> >> From: run...@gmail.com
> >> Subject: Re: Hadoop cluster running in cloudstack
> >> Date: Thu, 6 Jun 2013 10:23:50 -0400
> >> To: us...@cloudstack.apache.org
> >>
> >>
> >> On Jun 6, 2013, at 4:05 AM, Shanker Balan
> >><shanker.ba...@shapeblue.com>
> >>wrote:
> >>
> >> > On 05-Jun-2013, at 12:13 AM, David Ortiz <dpor...@outlook.com> wrote:
> >> >
> >> >> Hello,
> >> >>    Has anyone tried running a hadoop cluster in a cloudstack
> >>environment?  I have set one up, but I am finding that I am having
> >>some IO contention between slave nodes on each host since they all
> >>share one local storage pool.  As I understand it, there is not
> >>currently a method for using multiple local storage pools with VMs
> >>through cloudstack.  Has anyone found a workaround for this by any
> chance?
> >> >
> >> >
> >> > Hi David,
> >> >
> >> > Have you seen Seb's
> >>http://www.slideshare.net/sebastiengoasguen/cloudstack-and-bigdata
> >>slides yet?
> >>
> >> As a quick disclaimer, the various configurations I highlight in this
> >>deck are a bit hand wavy and I did not test them. I just made a guess
> >>about how one might want to use the baremetal functionality in
> >>cloudstack. The main distinction being between using a "big data"
> >>store as storage backends of cloudstack and using cloudstack to
> >>provision a bigdata store on-demand.
> >>
> >> -sebastien
> >>
> >> >
> >> > In my experience running Hadoop (100+ nodes) on traditional
> >> > servers,
> >>its going to be really hard to scale up Hadoop workloads using local
> >>storage and HDFS on a cloud.
> >> >
> >> > I ran out of IOPS very quickly. There was enough CPU headroom but
> >>could not add more slots as disk became the bottleneck. Every time
> >>there was a node/disk failure, rebalancing was a nightmare with a 3x
> >>HDFS replication factor.
> >> >
> >> > If I were to run Hadoop on an IaaS cloud, I would do it very
> >> > similar
> >>to Amazon AWS EMR - instances backed by a "Storage As A Service" layer
> >>(S3) for big data instead of HDFS.
> >> >
> >> > The system would work as below:
> >> >
> >> > - Create a dedicated big data storage tier using a distributed
> >>filesystem like Gluster/Ceph/Isilon. Most of the vendors now provide
> >>S3 compat connectors for Hadoop.
> >> >
> >> > http://ceph.com/docs/master/cephfs/hadoop/
> >> > http://gluster.org/community/documentation/index.php/Hadoop
> >> > http://www.emc.com/big-data/scale-out-storage-hadoop.htm
> >> >
> >> > - Hadoop instances are spun up on bare metal or on hypervisors. The
> >>service offerings for "big data" instances could will run on dedicated
> >>hypervisors (via tags) with high bandwidth network connectivity to the
> >>storage service.
> >> >
> >> > - Hadoop instances use Local storage for run time data.
> >> >
> >> > - Hadoop VMs connect to the storage tier via connectors for
> >> > permanent
> >>storage
> >> >
> >> > Benefits:
> >> >
> >> > - Spinning up/down VMs don't cause HDFS rebalancing as there is no
> >>HDFS anywhere.
> >> >
> >> > - Scale out VMs independently of storage. Add more spindles / nodes
> >>to the storage cluster to scale out IOPS and capacity
> >> >
> >> > - Easy upgrade of Hadoop releases without risk to data
> >> >
> >> > Regards.
> >> > @shankerbalan
> >> >
> >> > --
> >> > Shanker Balan
> >> > Managing Consultant
> >> >
> >> >
> >> >
> >> > M: +91 98860 60539
> >> > shanker.ba...@shapeblue.com | www.shapeblue.com |
> >> > Twitter:@shapeblue ShapeBlue India, 22nd floor, Unit 2201A, World
> >> > Trade Centre,
> >>Bangalore - 560 055
> >> >
> >> > This email and any attachments to it may be confidential and are
> >>intended solely for the use of the individual to whom it is addressed.
> >>Any views or opinions expressed are solely those of the author and do
> >>not necessarily represent those of Shape Blue Ltd or related companies.
> >>If you are not the intended recipient of this email, you must neither
> >>take any action based upon its contents, nor copy or show it to anyone.
> >>Please contact the sender if you believe you have received this email
> >>in error. Shape Blue Ltd is a company incorporated in England & Wales.
> >>ShapeBlue Services India LLP is operated under license from Shape Blue
> >>Ltd. ShapeBlue is a registered trademark.
> >>
> >

RE: Hadoop cluster running in cloudstack

Reply via email to