Your data is too small I guess for 15 clusters ..So it might be overhead time of these clusters making your total MR jobs more time consuming. I guess you will have to try with larger set of data..
Pankil On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <[email protected]> wrote: > Aaron > > That could be the issue, my data is just 516MB - wouldn't this see a bit of > speed up? > Could you guide me to the example? I ll run my cluster on it and see what I > get. Also for my program I had a java timer running to record the time > taken > to complete execution. Does Hadoop have an inbuilt timer? > > Mithila > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <[email protected]> wrote: > > > Virtually none of the examples that ship with Hadoop are designed to > > showcase its speed. Hadoop's speedup comes from its ability to process > very > > large volumes of data (starting around, say, tens of GB per job, and > going > > up in orders of magnitude from there). So if you are timing the pi > > calculator (or something like that), its results won't necessarily be > very > > consistent. If a job doesn't have enough fragments of data to allocate > one > > per each node, some of the nodes will also just go unused. > > > > The best example for you to run is to use randomwriter to fill up your > > cluster with several GB of random data and then run the sort program. If > > that doesn't scale up performance from 3 nodes to 15, then you've > > definitely > > got something strange going on. > > > > - Aaron > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <[email protected]> > > wrote: > > > > > Hey all > > > I recently setup a three node hadoop cluster and ran an examples on it. > > It > > > was pretty fast, and all the three nodes were being used (I checked the > > log > > > files to make sure that the slaves are utilized). > > > > > > Now I ve setup another cluster consisting of 15 nodes. I ran the same > > > example, but instead of speeding up, the map-reduce task seems to take > > > forever! The slaves are not being used for some reason. This second > > cluster > > > has a lower, per node processing power, but should that make any > > > difference? > > > How can I ensure that the data is being mapped to all the nodes? > > Presently, > > > the only node that seems to be doing all the work is the Master node. > > > > > > Does 15 nodes in a cluster increase the network cost? What can I do to > > > setup > > > the cluster to function more efficiently? > > > > > > Thanks! > > > Mithila Nagendra > > > Arizona State University > > > > > >
