Also, Would the way the port is accessed change if all these node are connected through a gateway? I mean in the hadoop-site.xml file? The Ubuntu systems we worked with earlier didnt have a gateway. Mithila
On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <[email protected]> wrote: > Aaron: Which log file do I look into - there are alot of them. Here s what > the error looks like: > [mith...@node19:~]$ cd hadoop > [mith...@node19:~/hadoop]$ bin/hadoop dfs -ls > 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server: node18/ > 192.168.0.18:54310. Already tried 0 time(s). > 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server: node18/ > 192.168.0.18:54310. Already tried 1 time(s). > 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server: node18/ > 192.168.0.18:54310. Already tried 2 time(s). > 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server: node18/ > 192.168.0.18:54310. Already tried 3 time(s). > 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server: node18/ > 192.168.0.18:54310. Already tried 4 time(s). > 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server: node18/ > 192.168.0.18:54310. Already tried 5 time(s). > 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server: node18/ > 192.168.0.18:54310. Already tried 6 time(s). > 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server: node18/ > 192.168.0.18:54310. Already tried 7 time(s). > 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server: node18/ > 192.168.0.18:54310. Already tried 8 time(s). > 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server: node18/ > 192.168.0.18:54310. Already tried 9 time(s). > Bad connection to FS. command aborted. > > Node19 is a slave and Node18 is the master. > > Mithila > > > > On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <[email protected]> wrote: > >> Are there any error messages in the log files on those nodes? >> - Aaron >> >> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <[email protected]> >> wrote: >> >> > I ve drawn a blank here! Can't figure out what s wrong with the ports. I >> > can >> > ssh between the nodes but cant access the DFS from the slaves - says >> "Bad >> > connection to DFS". Master seems to be fine. >> > Mithila >> > >> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <[email protected]> >> > wrote: >> > >> > > Yes I can.. >> > > >> > > >> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <[email protected] >> > >wrote: >> > > >> > >> Can you ssh between the nodes? >> > >> >> > >> -jim >> > >> >> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <[email protected]> >> > >> wrote: >> > >> >> > >> > Thanks Aaron. >> > >> > Jim: The three clusters I setup had ubuntu running on them and the >> dfs >> > >> was >> > >> > accessed at port 54310. The new cluster which I ve setup has Red >> Hat >> > >> Linux >> > >> > release 7.2 (Enigma)running on it. Now when I try to access the dfs >> > from >> > >> > one >> > >> > of the slaves i get the following response: dfs cannot be accessed. >> > When >> > >> I >> > >> > access the DFS throught the master there s no problem. So I feel >> there >> > a >> > >> > problem with the port. Any ideas? I did check the list of slaves, >> it >> > >> looks >> > >> > fine to me. >> > >> > >> > >> > Mithila >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky < >> [email protected]> >> > >> > wrote: >> > >> > >> > >> > > Mithila, >> > >> > > >> > >> > > You said all the slaves were being utilized in the 3 node >> cluster. >> > >> Which >> > >> > > application did you run to test that and what was your input >> size? >> > If >> > >> you >> > >> > > tried the word count application on a 516 MB input file on both >> > >> cluster >> > >> > > setups, than some of your nodes in the 15 node cluster may not be >> > >> running >> > >> > > at >> > >> > > all. Generally, one map job is assigned to each input split and >> if >> > you >> > >> > are >> > >> > > running your cluster with the defaults, the splits are 64 MB >> each. I >> > >> got >> > >> > > confused when you said the Namenode seemed to do all the work. >> Can >> > you >> > >> > > check >> > >> > > conf/slaves and make sure you put the names of all task trackers >> > >> there? I >> > >> > > also suggest comparing both clusters with a larger input size, >> say >> > at >> > >> > least >> > >> > > 5 GB, to really see a difference. >> > >> > > >> > >> > > Jim >> > >> > > >> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball < >> [email protected]> >> > >> > wrote: >> > >> > > >> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate the >> data >> > >> and >> > >> > > > "sort" >> > >> > > > to sort it. >> > >> > > > - Aaron >> > >> > > > >> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi < >> > [email protected]> >> > >> > > wrote: >> > >> > > > >> > >> > > > > Your data is too small I guess for 15 clusters ..So it might >> be >> > >> > > overhead >> > >> > > > > time of these clusters making your total MR jobs more time >> > >> consuming. >> > >> > > > > I guess you will have to try with larger set of data.. >> > >> > > > > >> > >> > > > > Pankil >> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra < >> > >> [email protected]> >> > >> > > > > wrote: >> > >> > > > > >> > >> > > > > > Aaron >> > >> > > > > > >> > >> > > > > > That could be the issue, my data is just 516MB - wouldn't >> this >> > >> see >> > >> > a >> > >> > > > bit >> > >> > > > > of >> > >> > > > > > speed up? >> > >> > > > > > Could you guide me to the example? I ll run my cluster on >> it >> > and >> > >> > see >> > >> > > > what >> > >> > > > > I >> > >> > > > > > get. Also for my program I had a java timer running to >> record >> > >> the >> > >> > > time >> > >> > > > > > taken >> > >> > > > > > to complete execution. Does Hadoop have an inbuilt timer? >> > >> > > > > > >> > >> > > > > > Mithila >> > >> > > > > > >> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball < >> > >> [email protected] >> > >> > > >> > >> > > > > wrote: >> > >> > > > > > >> > >> > > > > > > Virtually none of the examples that ship with Hadoop are >> > >> designed >> > >> > > to >> > >> > > > > > > showcase its speed. Hadoop's speedup comes from its >> ability >> > to >> > >> > > > process >> > >> > > > > > very >> > >> > > > > > > large volumes of data (starting around, say, tens of GB >> per >> > >> job, >> > >> > > and >> > >> > > > > > going >> > >> > > > > > > up in orders of magnitude from there). So if you are >> timing >> > >> the >> > >> > pi >> > >> > > > > > > calculator (or something like that), its results won't >> > >> > necessarily >> > >> > > be >> > >> > > > > > very >> > >> > > > > > > consistent. If a job doesn't have enough fragments of >> data >> > to >> > >> > > > allocate >> > >> > > > > > one >> > >> > > > > > > per each node, some of the nodes will also just go >> unused. >> > >> > > > > > > >> > >> > > > > > > The best example for you to run is to use randomwriter to >> > fill >> > >> up >> > >> > > > your >> > >> > > > > > > cluster with several GB of random data and then run the >> sort >> > >> > > program. >> > >> > > > > If >> > >> > > > > > > that doesn't scale up performance from 3 nodes to 15, >> then >> > >> you've >> > >> > > > > > > definitely >> > >> > > > > > > got something strange going on. >> > >> > > > > > > >> > >> > > > > > > - Aaron >> > >> > > > > > > >> > >> > > > > > > >> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra < >> > >> > > [email protected]> >> > >> > > > > > > wrote: >> > >> > > > > > > >> > >> > > > > > > > Hey all >> > >> > > > > > > > I recently setup a three node hadoop cluster and ran an >> > >> > examples >> > >> > > on >> > >> > > > > it. >> > >> > > > > > > It >> > >> > > > > > > > was pretty fast, and all the three nodes were being >> used >> > (I >> > >> > > checked >> > >> > > > > the >> > >> > > > > > > log >> > >> > > > > > > > files to make sure that the slaves are utilized). >> > >> > > > > > > > >> > >> > > > > > > > Now I ve setup another cluster consisting of 15 nodes. >> I >> > ran >> > >> > the >> > >> > > > same >> > >> > > > > > > > example, but instead of speeding up, the map-reduce >> task >> > >> seems >> > >> > to >> > >> > > > > take >> > >> > > > > > > > forever! The slaves are not being used for some reason. >> > This >> > >> > > second >> > >> > > > > > > cluster >> > >> > > > > > > > has a lower, per node processing power, but should that >> > make >> > >> > any >> > >> > > > > > > > difference? >> > >> > > > > > > > How can I ensure that the data is being mapped to all >> the >> > >> > nodes? >> > >> > > > > > > Presently, >> > >> > > > > > > > the only node that seems to be doing all the work is >> the >> > >> Master >> > >> > > > node. >> > >> > > > > > > > >> > >> > > > > > > > Does 15 nodes in a cluster increase the network cost? >> What >> > >> can >> > >> > I >> > >> > > do >> > >> > > > > to >> > >> > > > > > > > setup >> > >> > > > > > > > the cluster to function more efficiently? >> > >> > > > > > > > >> > >> > > > > > > > Thanks! >> > >> > > > > > > > Mithila Nagendra >> > >> > > > > > > > Arizona State University >> > >> > > > > > > > >> > >> > > > > > > >> > >> > > > > > >> > >> > > > > >> > >> > > > >> > >> > > >> > >> > >> > >> >> > > >> > > >> > >> > >
