The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the following in it:
2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode STARTUP_MSG: host = node19/127.0.0.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.18.3 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009 ************************************************************/ 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 0 time(s). 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 1 time(s). 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 2 time(s). 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 3 time(s). 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 4 time(s). 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 5 time(s). 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 6 time(s). 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 7 time(s). 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 8 time(s). 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 9 time(s). 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at node18/ 192.168.0.18:54310 not available yet, Zzzzz... 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 0 time(s). 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 1 time(s). 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 2 time(s). 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 3 time(s). 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 4 time(s). 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 5 time(s). 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 6 time(s). 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 7 time(s). 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 8 time(s). 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 9 time(s). 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server at node18/ 192.168.0.18:54310 not available yet, Zzzzz... 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 0 time(s). 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 1 time(s). 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: node18/192.168.0.18:54310. Already tried 2 time(s). Hmmm I still cant figure it out.. Mithila On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <[email protected]> wrote: > Also, Would the way the port is accessed change if all these node are > connected through a gateway? I mean in the hadoop-site.xml file? The Ubuntu > systems we worked with earlier didnt have a gateway. > Mithila > > On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <[email protected]>wrote: > >> Aaron: Which log file do I look into - there are alot of them. Here s what >> the error looks like: >> [mith...@node19:~]$ cd hadoop >> [mith...@node19:~/hadoop]$ bin/hadoop dfs -ls >> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server: node18/ >> 192.168.0.18:54310. Already tried 0 time(s). >> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server: node18/ >> 192.168.0.18:54310. Already tried 1 time(s). >> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server: node18/ >> 192.168.0.18:54310. Already tried 2 time(s). >> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server: node18/ >> 192.168.0.18:54310. Already tried 3 time(s). >> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server: node18/ >> 192.168.0.18:54310. Already tried 4 time(s). >> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server: node18/ >> 192.168.0.18:54310. Already tried 5 time(s). >> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server: node18/ >> 192.168.0.18:54310. Already tried 6 time(s). >> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server: node18/ >> 192.168.0.18:54310. Already tried 7 time(s). >> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server: node18/ >> 192.168.0.18:54310. Already tried 8 time(s). >> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server: node18/ >> 192.168.0.18:54310. Already tried 9 time(s). >> Bad connection to FS. command aborted. >> >> Node19 is a slave and Node18 is the master. >> >> Mithila >> >> >> >> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <[email protected]>wrote: >> >>> Are there any error messages in the log files on those nodes? >>> - Aaron >>> >>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <[email protected]> >>> wrote: >>> >>> > I ve drawn a blank here! Can't figure out what s wrong with the ports. >>> I >>> > can >>> > ssh between the nodes but cant access the DFS from the slaves - says >>> "Bad >>> > connection to DFS". Master seems to be fine. >>> > Mithila >>> > >>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <[email protected]> >>> > wrote: >>> > >>> > > Yes I can.. >>> > > >>> > > >>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <[email protected] >>> > >wrote: >>> > > >>> > >> Can you ssh between the nodes? >>> > >> >>> > >> -jim >>> > >> >>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <[email protected] >>> > >>> > >> wrote: >>> > >> >>> > >> > Thanks Aaron. >>> > >> > Jim: The three clusters I setup had ubuntu running on them and the >>> dfs >>> > >> was >>> > >> > accessed at port 54310. The new cluster which I ve setup has Red >>> Hat >>> > >> Linux >>> > >> > release 7.2 (Enigma)running on it. Now when I try to access the >>> dfs >>> > from >>> > >> > one >>> > >> > of the slaves i get the following response: dfs cannot be >>> accessed. >>> > When >>> > >> I >>> > >> > access the DFS throught the master there s no problem. So I feel >>> there >>> > a >>> > >> > problem with the port. Any ideas? I did check the list of slaves, >>> it >>> > >> looks >>> > >> > fine to me. >>> > >> > >>> > >> > Mithila >>> > >> > >>> > >> > >>> > >> > >>> > >> > >>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky < >>> [email protected]> >>> > >> > wrote: >>> > >> > >>> > >> > > Mithila, >>> > >> > > >>> > >> > > You said all the slaves were being utilized in the 3 node >>> cluster. >>> > >> Which >>> > >> > > application did you run to test that and what was your input >>> size? >>> > If >>> > >> you >>> > >> > > tried the word count application on a 516 MB input file on both >>> > >> cluster >>> > >> > > setups, than some of your nodes in the 15 node cluster may not >>> be >>> > >> running >>> > >> > > at >>> > >> > > all. Generally, one map job is assigned to each input split and >>> if >>> > you >>> > >> > are >>> > >> > > running your cluster with the defaults, the splits are 64 MB >>> each. I >>> > >> got >>> > >> > > confused when you said the Namenode seemed to do all the work. >>> Can >>> > you >>> > >> > > check >>> > >> > > conf/slaves and make sure you put the names of all task trackers >>> > >> there? I >>> > >> > > also suggest comparing both clusters with a larger input size, >>> say >>> > at >>> > >> > least >>> > >> > > 5 GB, to really see a difference. >>> > >> > > >>> > >> > > Jim >>> > >> > > >>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball < >>> [email protected]> >>> > >> > wrote: >>> > >> > > >>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate the >>> data >>> > >> and >>> > >> > > > "sort" >>> > >> > > > to sort it. >>> > >> > > > - Aaron >>> > >> > > > >>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi < >>> > [email protected]> >>> > >> > > wrote: >>> > >> > > > >>> > >> > > > > Your data is too small I guess for 15 clusters ..So it might >>> be >>> > >> > > overhead >>> > >> > > > > time of these clusters making your total MR jobs more time >>> > >> consuming. >>> > >> > > > > I guess you will have to try with larger set of data.. >>> > >> > > > > >>> > >> > > > > Pankil >>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra < >>> > >> [email protected]> >>> > >> > > > > wrote: >>> > >> > > > > >>> > >> > > > > > Aaron >>> > >> > > > > > >>> > >> > > > > > That could be the issue, my data is just 516MB - wouldn't >>> this >>> > >> see >>> > >> > a >>> > >> > > > bit >>> > >> > > > > of >>> > >> > > > > > speed up? >>> > >> > > > > > Could you guide me to the example? I ll run my cluster on >>> it >>> > and >>> > >> > see >>> > >> > > > what >>> > >> > > > > I >>> > >> > > > > > get. Also for my program I had a java timer running to >>> record >>> > >> the >>> > >> > > time >>> > >> > > > > > taken >>> > >> > > > > > to complete execution. Does Hadoop have an inbuilt timer? >>> > >> > > > > > >>> > >> > > > > > Mithila >>> > >> > > > > > >>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball < >>> > >> [email protected] >>> > >> > > >>> > >> > > > > wrote: >>> > >> > > > > > >>> > >> > > > > > > Virtually none of the examples that ship with Hadoop are >>> > >> designed >>> > >> > > to >>> > >> > > > > > > showcase its speed. Hadoop's speedup comes from its >>> ability >>> > to >>> > >> > > > process >>> > >> > > > > > very >>> > >> > > > > > > large volumes of data (starting around, say, tens of GB >>> per >>> > >> job, >>> > >> > > and >>> > >> > > > > > going >>> > >> > > > > > > up in orders of magnitude from there). So if you are >>> timing >>> > >> the >>> > >> > pi >>> > >> > > > > > > calculator (or something like that), its results won't >>> > >> > necessarily >>> > >> > > be >>> > >> > > > > > very >>> > >> > > > > > > consistent. If a job doesn't have enough fragments of >>> data >>> > to >>> > >> > > > allocate >>> > >> > > > > > one >>> > >> > > > > > > per each node, some of the nodes will also just go >>> unused. >>> > >> > > > > > > >>> > >> > > > > > > The best example for you to run is to use randomwriter >>> to >>> > fill >>> > >> up >>> > >> > > > your >>> > >> > > > > > > cluster with several GB of random data and then run the >>> sort >>> > >> > > program. >>> > >> > > > > If >>> > >> > > > > > > that doesn't scale up performance from 3 nodes to 15, >>> then >>> > >> you've >>> > >> > > > > > > definitely >>> > >> > > > > > > got something strange going on. >>> > >> > > > > > > >>> > >> > > > > > > - Aaron >>> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra < >>> > >> > > [email protected]> >>> > >> > > > > > > wrote: >>> > >> > > > > > > >>> > >> > > > > > > > Hey all >>> > >> > > > > > > > I recently setup a three node hadoop cluster and ran >>> an >>> > >> > examples >>> > >> > > on >>> > >> > > > > it. >>> > >> > > > > > > It >>> > >> > > > > > > > was pretty fast, and all the three nodes were being >>> used >>> > (I >>> > >> > > checked >>> > >> > > > > the >>> > >> > > > > > > log >>> > >> > > > > > > > files to make sure that the slaves are utilized). >>> > >> > > > > > > > >>> > >> > > > > > > > Now I ve setup another cluster consisting of 15 nodes. >>> I >>> > ran >>> > >> > the >>> > >> > > > same >>> > >> > > > > > > > example, but instead of speeding up, the map-reduce >>> task >>> > >> seems >>> > >> > to >>> > >> > > > > take >>> > >> > > > > > > > forever! The slaves are not being used for some >>> reason. >>> > This >>> > >> > > second >>> > >> > > > > > > cluster >>> > >> > > > > > > > has a lower, per node processing power, but should >>> that >>> > make >>> > >> > any >>> > >> > > > > > > > difference? >>> > >> > > > > > > > How can I ensure that the data is being mapped to all >>> the >>> > >> > nodes? >>> > >> > > > > > > Presently, >>> > >> > > > > > > > the only node that seems to be doing all the work is >>> the >>> > >> Master >>> > >> > > > node. >>> > >> > > > > > > > >>> > >> > > > > > > > Does 15 nodes in a cluster increase the network cost? >>> What >>> > >> can >>> > >> > I >>> > >> > > do >>> > >> > > > > to >>> > >> > > > > > > > setup >>> > >> > > > > > > > the cluster to function more efficiently? >>> > >> > > > > > > > >>> > >> > > > > > > > Thanks! >>> > >> > > > > > > > Mithila Nagendra >>> > >> > > > > > > > Arizona State University >>> > >> > > > > > > > >>> > >> > > > > > > >>> > >> > > > > > >>> > >> > > > > >>> > >> > > > >>> > >> > > >>> > >> > >>> > >> >>> > > >>> > > >>> > >>> >> >> >
