Mithila, You said all the slaves were being utilized in the 3 node cluster. Which application did you run to test that and what was your input size? If you tried the word count application on a 516 MB input file on both cluster setups, than some of your nodes in the 15 node cluster may not be running at all. Generally, one map job is assigned to each input split and if you are running your cluster with the defaults, the splits are 64 MB each. I got confused when you said the Namenode seemed to do all the work. Can you check conf/slaves and make sure you put the names of all task trackers there? I also suggest comparing both clusters with a larger input size, say at least 5 GB, to really see a difference.
Jim On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <[email protected]> wrote: > in hadoop-*-examples.jar, use "randomwriter" to generate the data and > "sort" > to sort it. > - Aaron > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <[email protected]> wrote: > > > Your data is too small I guess for 15 clusters ..So it might be overhead > > time of these clusters making your total MR jobs more time consuming. > > I guess you will have to try with larger set of data.. > > > > Pankil > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <[email protected]> > > wrote: > > > > > Aaron > > > > > > That could be the issue, my data is just 516MB - wouldn't this see a > bit > > of > > > speed up? > > > Could you guide me to the example? I ll run my cluster on it and see > what > > I > > > get. Also for my program I had a java timer running to record the time > > > taken > > > to complete execution. Does Hadoop have an inbuilt timer? > > > > > > Mithila > > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <[email protected]> > > wrote: > > > > > > > Virtually none of the examples that ship with Hadoop are designed to > > > > showcase its speed. Hadoop's speedup comes from its ability to > process > > > very > > > > large volumes of data (starting around, say, tens of GB per job, and > > > going > > > > up in orders of magnitude from there). So if you are timing the pi > > > > calculator (or something like that), its results won't necessarily be > > > very > > > > consistent. If a job doesn't have enough fragments of data to > allocate > > > one > > > > per each node, some of the nodes will also just go unused. > > > > > > > > The best example for you to run is to use randomwriter to fill up > your > > > > cluster with several GB of random data and then run the sort program. > > If > > > > that doesn't scale up performance from 3 nodes to 15, then you've > > > > definitely > > > > got something strange going on. > > > > > > > > - Aaron > > > > > > > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <[email protected]> > > > > wrote: > > > > > > > > > Hey all > > > > > I recently setup a three node hadoop cluster and ran an examples on > > it. > > > > It > > > > > was pretty fast, and all the three nodes were being used (I checked > > the > > > > log > > > > > files to make sure that the slaves are utilized). > > > > > > > > > > Now I ve setup another cluster consisting of 15 nodes. I ran the > same > > > > > example, but instead of speeding up, the map-reduce task seems to > > take > > > > > forever! The slaves are not being used for some reason. This second > > > > cluster > > > > > has a lower, per node processing power, but should that make any > > > > > difference? > > > > > How can I ensure that the data is being mapped to all the nodes? > > > > Presently, > > > > > the only node that seems to be doing all the work is the Master > node. > > > > > > > > > > Does 15 nodes in a cluster increase the network cost? What can I do > > to > > > > > setup > > > > > the cluster to function more efficiently? > > > > > > > > > > Thanks! > > > > > Mithila Nagendra > > > > > Arizona State University > > > > > > > > > > > > > > >
