Yes I can.. On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <[email protected]> wrote:
> Can you ssh between the nodes? > > -jim > > On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <[email protected]> > wrote: > > > Thanks Aaron. > > Jim: The three clusters I setup had ubuntu running on them and the dfs > was > > accessed at port 54310. The new cluster which I ve setup has Red Hat > Linux > > release 7.2 (Enigma)running on it. Now when I try to access the dfs from > > one > > of the slaves i get the following response: dfs cannot be accessed. When > I > > access the DFS throught the master there s no problem. So I feel there a > > problem with the port. Any ideas? I did check the list of slaves, it > looks > > fine to me. > > > > Mithila > > > > > > > > > > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <[email protected]> > > wrote: > > > > > Mithila, > > > > > > You said all the slaves were being utilized in the 3 node cluster. > Which > > > application did you run to test that and what was your input size? If > you > > > tried the word count application on a 516 MB input file on both cluster > > > setups, than some of your nodes in the 15 node cluster may not be > running > > > at > > > all. Generally, one map job is assigned to each input split and if you > > are > > > running your cluster with the defaults, the splits are 64 MB each. I > got > > > confused when you said the Namenode seemed to do all the work. Can you > > > check > > > conf/slaves and make sure you put the names of all task trackers there? > I > > > also suggest comparing both clusters with a larger input size, say at > > least > > > 5 GB, to really see a difference. > > > > > > Jim > > > > > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <[email protected]> > > wrote: > > > > > > > in hadoop-*-examples.jar, use "randomwriter" to generate the data and > > > > "sort" > > > > to sort it. > > > > - Aaron > > > > > > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <[email protected]> > > > wrote: > > > > > > > > > Your data is too small I guess for 15 clusters ..So it might be > > > overhead > > > > > time of these clusters making your total MR jobs more time > consuming. > > > > > I guess you will have to try with larger set of data.. > > > > > > > > > > Pankil > > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra < > [email protected]> > > > > > wrote: > > > > > > > > > > > Aaron > > > > > > > > > > > > That could be the issue, my data is just 516MB - wouldn't this > see > > a > > > > bit > > > > > of > > > > > > speed up? > > > > > > Could you guide me to the example? I ll run my cluster on it and > > see > > > > what > > > > > I > > > > > > get. Also for my program I had a java timer running to record the > > > time > > > > > > taken > > > > > > to complete execution. Does Hadoop have an inbuilt timer? > > > > > > > > > > > > Mithila > > > > > > > > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > Virtually none of the examples that ship with Hadoop are > designed > > > to > > > > > > > showcase its speed. Hadoop's speedup comes from its ability to > > > > process > > > > > > very > > > > > > > large volumes of data (starting around, say, tens of GB per > job, > > > and > > > > > > going > > > > > > > up in orders of magnitude from there). So if you are timing the > > pi > > > > > > > calculator (or something like that), its results won't > > necessarily > > > be > > > > > > very > > > > > > > consistent. If a job doesn't have enough fragments of data to > > > > allocate > > > > > > one > > > > > > > per each node, some of the nodes will also just go unused. > > > > > > > > > > > > > > The best example for you to run is to use randomwriter to fill > up > > > > your > > > > > > > cluster with several GB of random data and then run the sort > > > program. > > > > > If > > > > > > > that doesn't scale up performance from 3 nodes to 15, then > you've > > > > > > > definitely > > > > > > > got something strange going on. > > > > > > > > > > > > > > - Aaron > > > > > > > > > > > > > > > > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra < > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > Hey all > > > > > > > > I recently setup a three node hadoop cluster and ran an > > examples > > > on > > > > > it. > > > > > > > It > > > > > > > > was pretty fast, and all the three nodes were being used (I > > > checked > > > > > the > > > > > > > log > > > > > > > > files to make sure that the slaves are utilized). > > > > > > > > > > > > > > > > Now I ve setup another cluster consisting of 15 nodes. I ran > > the > > > > same > > > > > > > > example, but instead of speeding up, the map-reduce task > seems > > to > > > > > take > > > > > > > > forever! The slaves are not being used for some reason. This > > > second > > > > > > > cluster > > > > > > > > has a lower, per node processing power, but should that make > > any > > > > > > > > difference? > > > > > > > > How can I ensure that the data is being mapped to all the > > nodes? > > > > > > > Presently, > > > > > > > > the only node that seems to be doing all the work is the > Master > > > > node. > > > > > > > > > > > > > > > > Does 15 nodes in a cluster increase the network cost? What > can > > I > > > do > > > > > to > > > > > > > > setup > > > > > > > > the cluster to function more efficiently? > > > > > > > > > > > > > > > > Thanks! > > > > > > > > Mithila Nagendra > > > > > > > > Arizona State University > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
