Virtually none of the examples that ship with Hadoop are designed to showcase its speed. Hadoop's speedup comes from its ability to process very large volumes of data (starting around, say, tens of GB per job, and going up in orders of magnitude from there). So if you are timing the pi calculator (or something like that), its results won't necessarily be very consistent. If a job doesn't have enough fragments of data to allocate one per each node, some of the nodes will also just go unused.
The best example for you to run is to use randomwriter to fill up your cluster with several GB of random data and then run the sort program. If that doesn't scale up performance from 3 nodes to 15, then you've definitely got something strange going on. - Aaron On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <[email protected]> wrote: > Hey all > I recently setup a three node hadoop cluster and ran an examples on it. It > was pretty fast, and all the three nodes were being used (I checked the log > files to make sure that the slaves are utilized). > > Now I ve setup another cluster consisting of 15 nodes. I ran the same > example, but instead of speeding up, the map-reduce task seems to take > forever! The slaves are not being used for some reason. This second cluster > has a lower, per node processing power, but should that make any > difference? > How can I ensure that the data is being mapped to all the nodes? Presently, > the only node that seems to be doing all the work is the Master node. > > Does 15 nodes in a cluster increase the network cost? What can I do to > setup > the cluster to function more efficiently? > > Thanks! > Mithila Nagendra > Arizona State University >
