Hello,

I have Eucalyptus 1.6.2 installed on ubuntu 10.04 using source installation with kvm. Currently I have ten nodes in my cloud in a single cluster architecture.
Also I have tested Hadoop on VM's and run several  jobs

I am trying to run Hadoop in a cloud environment. So I will launch hadoop instances on the cloud. Now there is huge data on each Hadoop node so I am planning to use volumes as of now to store that data of each instance i.e Hadoop node. But since volumes are stored at Storage controllers so this means that there is continuous movement of data (lots of GBs) in cloud network from SC to node and also the response time of work done on Hadoop instances will be slow due to time taken by data to travel in the network.

So, now is it possible to store volumes (or any other way) on the nodes so that above problem can be resolved.

Second case : I can store data on the hard disk attached to the nodes and Hadoop instances can access that data easily but for that I would be required to start the instances on the node where data has been stored. So for this can I by using any hack or by anything decide the node for a instance to be started.

Can anyone who has some working experience with Hadoop on cloud environment give me any pointers?
I will really appreciate any sort of support on this.

Finally is it worthful to do this as I previously recieve some response like this :

Is it possible to run Hadoop in VMs on Production Clusters so that we
have 10000s of nodes on 100s of servers to achieve high performance
through Cloud Computing.

you don't achieve performance that way. You are better off with 1VM per physical host, and you will need to talk to a persistent filestore for the data you want to retain. Running >1 VM per physical host just creates conflict for things like disk, ether and CPU that the virtual OS won't be aware of. Also, VM to disk performance is pretty bad right now, though that's improving.


Thanks & Regards

Adarsh Sharma

Reply via email to