Hello,
I have Eucalyptus 1.6.2 installed on ubuntu 10.04 using source
installation with kvm. Currently I have ten nodes in my cloud in a
single cluster architecture.
Also I have tested Hadoop on VM's and run several jobs
I am trying to run Hadoop in a cloud environment. So I will launch
hadoop instances on the cloud. Now there is huge data on each Hadoop
node so I am planning to use volumes as of now to store that data of
each instance i.e Hadoop node. But since volumes are stored at Storage
controllers so this means that there is continuous movement of data
(lots of GBs) in cloud network from SC to node and also the response
time of work done on Hadoop instances will be slow due to time taken by
data to travel in the network.
So, now is it possible to store volumes (or any other way) on the nodes
so that above problem can be resolved.
Second case : I can store data on the hard disk attached to the nodes
and Hadoop instances can access that data easily but for that I would be
required to start the instances on the node where data has been stored.
So for this can I by using any hack or by anything decide the node for a
instance to be started.
Can anyone who has some working experience with Hadoop on cloud
environment give me any pointers?
I will really appreciate any sort of support on this.
Finally is it worthful to do this as I previously recieve some response
like this :
Is it possible to run Hadoop in VMs on Production Clusters so that we
have 10000s of nodes on 100s of servers to achieve high performance
through Cloud Computing.
you don't achieve performance that way. You are better off with 1VM per
physical host, and you will need to talk to a persistent filestore for
the data you want to retain. Running >1 VM per physical host just
creates conflict for things like disk, ether and CPU that the virtual OS
won't be aware of. Also, VM to disk performance is pretty bad right now,
though that's improving.
Thanks & Regards
Adarsh Sharma