Hi, I wrote a blog post a while back about connecting nodes via a gateway. See http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
This assumes that the client is outside the gateway and all datanodes/namenode are inside, but the same principles apply. You'll just need to set up ssh tunnels from every datanode to the namenode. - Aaron On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <[email protected]>wrote: > Looks like your NameNode is down . > Verify if hadoop process are running ( jps should show you all java > running process). > If your hadoop process are running try restarting your hadoop process . > I guess this problem is due to your fsimage not being correct . > You might have to format your namenode. > Hope this helps. > > Thanks, > -- > Ravi > > > On 4/15/09 10:15 AM, "Mithila Nagendra" <[email protected]> wrote: > > The log file runs into thousands of line with the same message being > displayed every time. > > On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <[email protected]> > wrote: > > > The log file : hadoop-mithila-datanode-node19.log.2009-04-14 has the > > following in it: > > > > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode: STARTUP_MSG: > > /************************************************************ > > STARTUP_MSG: Starting DataNode > > STARTUP_MSG: host = node19/127.0.0.1 > > STARTUP_MSG: args = [] > > STARTUP_MSG: version = 0.18.3 > > STARTUP_MSG: build = > > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18 -r > > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009 > > ************************************************************/ > > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 0 time(s). > > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 1 time(s). > > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 2 time(s). > > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 3 time(s). > > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 4 time(s). > > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 5 time(s). > > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 6 time(s). > > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 7 time(s). > > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 8 time(s). > > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 9 time(s). > > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: Server at node18/ > > 192.168.0.18:54310 not available yet, Zzzzz... > > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 0 time(s). > > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 1 time(s). > > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 2 time(s). > > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 3 time(s). > > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 4 time(s). > > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 5 time(s). > > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 6 time(s). > > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 7 time(s). > > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 8 time(s). > > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 9 time(s). > > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: Server at node18/ > > 192.168.0.18:54310 not available yet, Zzzzz... > > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 0 time(s). > > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 1 time(s). > > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client: Retrying > connect > > to server: node18/192.168.0.18:54310. Already tried 2 time(s). > > > > > > Hmmm I still cant figure it out.. > > > > Mithila > > > > > > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <[email protected] > >wrote: > > > >> Also, Would the way the port is accessed change if all these node are > >> connected through a gateway? I mean in the hadoop-site.xml file? The > Ubuntu > >> systems we worked with earlier didnt have a gateway. > >> Mithila > >> > >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <[email protected] > >wrote: > >> > >>> Aaron: Which log file do I look into - there are alot of them. Here s > >>> what the error looks like: > >>> [mith...@node19:~]$ cd hadoop > >>> [mith...@node19:~/hadoop]$ bin/hadoop dfs -ls > >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server: node18/ > >>> 192.168.0.18:54310. Already tried 0 time(s). > >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server: node18/ > >>> 192.168.0.18:54310. Already tried 1 time(s). > >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server: node18/ > >>> 192.168.0.18:54310. Already tried 2 time(s). > >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server: node18/ > >>> 192.168.0.18:54310. Already tried 3 time(s). > >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server: node18/ > >>> 192.168.0.18:54310. Already tried 4 time(s). > >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server: node18/ > >>> 192.168.0.18:54310. Already tried 5 time(s). > >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server: node18/ > >>> 192.168.0.18:54310. Already tried 6 time(s). > >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server: node18/ > >>> 192.168.0.18:54310. Already tried 7 time(s). > >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server: node18/ > >>> 192.168.0.18:54310. Already tried 8 time(s). > >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server: node18/ > >>> 192.168.0.18:54310. Already tried 9 time(s). > >>> Bad connection to FS. command aborted. > >>> > >>> Node19 is a slave and Node18 is the master. > >>> > >>> Mithila > >>> > >>> > >>> > >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <[email protected] > >wrote: > >>> > >>>> Are there any error messages in the log files on those nodes? > >>>> - Aaron > >>>> > >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <[email protected]> > >>>> wrote: > >>>> > >>>> > I ve drawn a blank here! Can't figure out what s wrong with the > ports. > >>>> I > >>>> > can > >>>> > ssh between the nodes but cant access the DFS from the slaves - says > >>>> "Bad > >>>> > connection to DFS". Master seems to be fine. > >>>> > Mithila > >>>> > > >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <[email protected] > > > >>>> > wrote: > >>>> > > >>>> > > Yes I can.. > >>>> > > > >>>> > > > >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky < > [email protected] > >>>> > >wrote: > >>>> > > > >>>> > >> Can you ssh between the nodes? > >>>> > >> > >>>> > >> -jim > >>>> > >> > >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra < > >>>> [email protected]> > >>>> > >> wrote: > >>>> > >> > >>>> > >> > Thanks Aaron. > >>>> > >> > Jim: The three clusters I setup had ubuntu running on them and > >>>> the dfs > >>>> > >> was > >>>> > >> > accessed at port 54310. The new cluster which I ve setup has > Red > >>>> Hat > >>>> > >> Linux > >>>> > >> > release 7.2 (Enigma)running on it. Now when I try to access the > >>>> dfs > >>>> > from > >>>> > >> > one > >>>> > >> > of the slaves i get the following response: dfs cannot be > >>>> accessed. > >>>> > When > >>>> > >> I > >>>> > >> > access the DFS throught the master there s no problem. So I > feel > >>>> there > >>>> > a > >>>> > >> > problem with the port. Any ideas? I did check the list of > slaves, > >>>> it > >>>> > >> looks > >>>> > >> > fine to me. > >>>> > >> > > >>>> > >> > Mithila > >>>> > >> > > >>>> > >> > > >>>> > >> > > >>>> > >> > > >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky < > >>>> [email protected]> > >>>> > >> > wrote: > >>>> > >> > > >>>> > >> > > Mithila, > >>>> > >> > > > >>>> > >> > > You said all the slaves were being utilized in the 3 node > >>>> cluster. > >>>> > >> Which > >>>> > >> > > application did you run to test that and what was your input > >>>> size? > >>>> > If > >>>> > >> you > >>>> > >> > > tried the word count application on a 516 MB input file on > both > >>>> > >> cluster > >>>> > >> > > setups, than some of your nodes in the 15 node cluster may > not > >>>> be > >>>> > >> running > >>>> > >> > > at > >>>> > >> > > all. Generally, one map job is assigned to each input split > and > >>>> if > >>>> > you > >>>> > >> > are > >>>> > >> > > running your cluster with the defaults, the splits are 64 MB > >>>> each. I > >>>> > >> got > >>>> > >> > > confused when you said the Namenode seemed to do all the > work. > >>>> Can > >>>> > you > >>>> > >> > > check > >>>> > >> > > conf/slaves and make sure you put the names of all task > >>>> trackers > >>>> > >> there? I > >>>> > >> > > also suggest comparing both clusters with a larger input > size, > >>>> say > >>>> > at > >>>> > >> > least > >>>> > >> > > 5 GB, to really see a difference. > >>>> > >> > > > >>>> > >> > > Jim > >>>> > >> > > > >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball < > >>>> [email protected]> > >>>> > >> > wrote: > >>>> > >> > > > >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate > the > >>>> data > >>>> > >> and > >>>> > >> > > > "sort" > >>>> > >> > > > to sort it. > >>>> > >> > > > - Aaron > >>>> > >> > > > > >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi < > >>>> > [email protected]> > >>>> > >> > > wrote: > >>>> > >> > > > > >>>> > >> > > > > Your data is too small I guess for 15 clusters ..So it > >>>> might be > >>>> > >> > > overhead > >>>> > >> > > > > time of these clusters making your total MR jobs more > time > >>>> > >> consuming. > >>>> > >> > > > > I guess you will have to try with larger set of data.. > >>>> > >> > > > > > >>>> > >> > > > > Pankil > >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra < > >>>> > >> [email protected]> > >>>> > >> > > > > wrote: > >>>> > >> > > > > > >>>> > >> > > > > > Aaron > >>>> > >> > > > > > > >>>> > >> > > > > > That could be the issue, my data is just 516MB - > wouldn't > >>>> this > >>>> > >> see > >>>> > >> > a > >>>> > >> > > > bit > >>>> > >> > > > > of > >>>> > >> > > > > > speed up? > >>>> > >> > > > > > Could you guide me to the example? I ll run my cluster > on > >>>> it > >>>> > and > >>>> > >> > see > >>>> > >> > > > what > >>>> > >> > > > > I > >>>> > >> > > > > > get. Also for my program I had a java timer running to > >>>> record > >>>> > >> the > >>>> > >> > > time > >>>> > >> > > > > > taken > >>>> > >> > > > > > to complete execution. Does Hadoop have an inbuilt > timer? > >>>> > >> > > > > > > >>>> > >> > > > > > Mithila > >>>> > >> > > > > > > >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball < > >>>> > >> [email protected] > >>>> > >> > > > >>>> > >> > > > > wrote: > >>>> > >> > > > > > > >>>> > >> > > > > > > Virtually none of the examples that ship with Hadoop > >>>> are > >>>> > >> designed > >>>> > >> > > to > >>>> > >> > > > > > > showcase its speed. Hadoop's speedup comes from its > >>>> ability > >>>> > to > >>>> > >> > > > process > >>>> > >> > > > > > very > >>>> > >> > > > > > > large volumes of data (starting around, say, tens of > GB > >>>> per > >>>> > >> job, > >>>> > >> > > and > >>>> > >> > > > > > going > >>>> > >> > > > > > > up in orders of magnitude from there). So if you are > >>>> timing > >>>> > >> the > >>>> > >> > pi > >>>> > >> > > > > > > calculator (or something like that), its results > won't > >>>> > >> > necessarily > >>>> > >> > > be > >>>> > >> > > > > > very > >>>> > >> > > > > > > consistent. If a job doesn't have enough fragments of > >>>> data > >>>> > to > >>>> > >> > > > allocate > >>>> > >> > > > > > one > >>>> > >> > > > > > > per each node, some of the nodes will also just go > >>>> unused. > >>>> > >> > > > > > > > >>>> > >> > > > > > > The best example for you to run is to use > randomwriter > >>>> to > >>>> > fill > >>>> > >> up > >>>> > >> > > > your > >>>> > >> > > > > > > cluster with several GB of random data and then run > the > >>>> sort > >>>> > >> > > program. > >>>> > >> > > > > If > >>>> > >> > > > > > > that doesn't scale up performance from 3 nodes to 15, > >>>> then > >>>> > >> you've > >>>> > >> > > > > > > definitely > >>>> > >> > > > > > > got something strange going on. > >>>> > >> > > > > > > > >>>> > >> > > > > > > - Aaron > >>>> > >> > > > > > > > >>>> > >> > > > > > > > >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra < > >>>> > >> > > [email protected]> > >>>> > >> > > > > > > wrote: > >>>> > >> > > > > > > > >>>> > >> > > > > > > > Hey all > >>>> > >> > > > > > > > I recently setup a three node hadoop cluster and > ran > >>>> an > >>>> > >> > examples > >>>> > >> > > on > >>>> > >> > > > > it. > >>>> > >> > > > > > > It > >>>> > >> > > > > > > > was pretty fast, and all the three nodes were being > >>>> used > >>>> > (I > >>>> > >> > > checked > >>>> > >> > > > > the > >>>> > >> > > > > > > log > >>>> > >> > > > > > > > files to make sure that the slaves are utilized). > >>>> > >> > > > > > > > > >>>> > >> > > > > > > > Now I ve setup another cluster consisting of 15 > >>>> nodes. I > >>>> > ran > >>>> > >> > the > >>>> > >> > > > same > >>>> > >> > > > > > > > example, but instead of speeding up, the map-reduce > >>>> task > >>>> > >> seems > >>>> > >> > to > >>>> > >> > > > > take > >>>> > >> > > > > > > > forever! The slaves are not being used for some > >>>> reason. > >>>> > This > >>>> > >> > > second > >>>> > >> > > > > > > cluster > >>>> > >> > > > > > > > has a lower, per node processing power, but should > >>>> that > >>>> > make > >>>> > >> > any > >>>> > >> > > > > > > > difference? > >>>> > >> > > > > > > > How can I ensure that the data is being mapped to > all > >>>> the > >>>> > >> > nodes? > >>>> > >> > > > > > > Presently, > >>>> > >> > > > > > > > the only node that seems to be doing all the work > is > >>>> the > >>>> > >> Master > >>>> > >> > > > node. > >>>> > >> > > > > > > > > >>>> > >> > > > > > > > Does 15 nodes in a cluster increase the network > cost? > >>>> What > >>>> > >> can > >>>> > >> > I > >>>> > >> > > do > >>>> > >> > > > > to > >>>> > >> > > > > > > > setup > >>>> > >> > > > > > > > the cluster to function more efficiently? > >>>> > >> > > > > > > > > >>>> > >> > > > > > > > Thanks! > >>>> > >> > > > > > > > Mithila Nagendra > >>>> > >> > > > > > > > Arizona State University > >>>> > >> > > > > > > > > >>>> > >> > > > > > > > >>>> > >> > > > > > > >>>> > >> > > > > > >>>> > >> > > > > >>>> > >> > > > >>>> > >> > > >>>> > >> > >>>> > > > >>>> > > > >>>> > > >>>> > >>> > >>> > >> > > > > > Ravi > -- > >
