Hey Jason The problem s fixed! :) My network admin had messed something up! Now it works! Thanks for your help!
Mithila On Thu, Apr 16, 2009 at 11:58 PM, Mithila Nagendra <[email protected]> wrote: > Thanks Jason! This helps a lot. I m planning to talk to my network admin > tomorrow. I hoping he ll be able to fix this problem. > Mithila > > > On Fri, Apr 17, 2009 at 9:00 AM, jason hadoop <[email protected]>wrote: > >> Assuming you are on a linux box, on both machines >> verify that the servers are listening on the ports you expect via >> netstat -a -n -t -p >> -a show sockets accepting connections >> -n do not translate ip addresses to host names >> -t only list tcp sockets >> -p list the pid/process name >> >> on the machine 192.168.0.18 >> you should have sockets bound to 0.0.0.0:54310 with a process of java, >> and >> the pid should be the pid of your namenode process. >> >> On the remote machine you should be able to *telnet 192.168.0.18 54310* >> and >> have it connect >> *Connected to 192.168.0.18. >> Escape character is '^]'. >> * >> >> If the netstat shows the socket accepting and the telnet does not connect, >> then something is blocking the TCP packets between the machines. one or >> both >> machines has a firewall, an intervening router has a firewall, or there is >> some routing problem >> the command /sbin/iptables -L will normally list the firewall rules, if >> any >> for a linux machine. >> >> >> You should be able to use telnet to verify that you can connect from the >> remote machine. >> >> On Thu, Apr 16, 2009 at 9:18 PM, Mithila Nagendra <[email protected]> >> wrote: >> >> > Thanks! I ll see what I can find out. >> > >> > On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop <[email protected] >> > >wrote: >> > >> > > The firewall was run at system startup, I think there was a >> > > /etc/sysconfig/iptables file present which triggered the firewall. >> > > I don't currently have access to any centos 5 machines so I can't >> easily >> > > check. >> > > >> > > >> > > >> > > On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop <[email protected] >> > > >wrote: >> > > >> > > > The kickstart script was something that the operations staff was >> using >> > to >> > > > initialize new machines, I never actually saw the script, just >> figured >> > > out >> > > > that there was a firewall in place. >> > > > >> > > > >> > > > >> > > > On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra <[email protected] >> > > >wrote: >> > > > >> > > >> Jason: the kickstart script - was it something you wrote or is it >> run >> > > when >> > > >> the system turns on? >> > > >> Mithila >> > > >> >> > > >> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra < >> [email protected]> >> > > >> wrote: >> > > >> >> > > >> > Thanks Jason! Will check that out. >> > > >> > Mithila >> > > >> > >> > > >> > >> > > >> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop < >> > [email protected] >> > > >> >wrote: >> > > >> > >> > > >> >> Double check that there is no firewall in place. >> > > >> >> At one point a bunch of new machines were kickstarted and placed >> in >> > a >> > > >> >> cluster and they all failed with something similar. >> > > >> >> It turned out the kickstart script turned enabled the firewall >> with >> > a >> > > >> rule >> > > >> >> that blocked ports in the 50k range. >> > > >> >> It took us a while to even think to check that was not a part of >> > our >> > > >> >> normal >> > > >> >> machine configuration >> > > >> >> >> > > >> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra < >> > [email protected] >> > > > >> > > >> >> wrote: >> > > >> >> >> > > >> >> > Hi Aaron >> > > >> >> > I will look into that thanks! >> > > >> >> > >> > > >> >> > I spoke to the admin who overlooks the cluster. He said that >> the >> > > >> gateway >> > > >> >> > comes in to the picture only when one of the nodes >> communicates >> > > with >> > > >> a >> > > >> >> node >> > > >> >> > outside of the cluster. But in my case the communication is >> > carried >> > > >> out >> > > >> >> > between the nodes which all belong to the same cluster. >> > > >> >> > >> > > >> >> > Mithila >> > > >> >> > >> > > >> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball < >> > [email protected] >> > > > >> > > >> >> wrote: >> > > >> >> > >> > > >> >> > > Hi, >> > > >> >> > > >> > > >> >> > > I wrote a blog post a while back about connecting nodes via >> a >> > > >> gateway. >> > > >> >> > See >> > > >> >> > > >> > > >> >> > >> > > >> >> >> > > >> >> > > >> > >> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/ >> > > >> >> > > >> > > >> >> > > This assumes that the client is outside the gateway and all >> > > >> >> > > datanodes/namenode are inside, but the same principles >> apply. >> > > >> You'll >> > > >> >> just >> > > >> >> > > need to set up ssh tunnels from every datanode to the >> namenode. >> > > >> >> > > >> > > >> >> > > - Aaron >> > > >> >> > > >> > > >> >> > > >> > > >> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari < >> > > >> >> [email protected] >> > > >> >> > >wrote: >> > > >> >> > > >> > > >> >> > >> Looks like your NameNode is down . >> > > >> >> > >> Verify if hadoop process are running ( jps should show >> you >> > all >> > > >> java >> > > >> >> > >> running process). >> > > >> >> > >> If your hadoop process are running try restarting your >> hadoop >> > > >> process >> > > >> >> . >> > > >> >> > >> I guess this problem is due to your fsimage not being >> correct >> > . >> > > >> >> > >> You might have to format your namenode. >> > > >> >> > >> Hope this helps. >> > > >> >> > >> >> > > >> >> > >> Thanks, >> > > >> >> > >> -- >> > > >> >> > >> Ravi >> > > >> >> > >> >> > > >> >> > >> >> > > >> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <[email protected]> >> > > wrote: >> > > >> >> > >> >> > > >> >> > >> The log file runs into thousands of line with the same >> message >> > > >> being >> > > >> >> > >> displayed every time. >> > > >> >> > >> >> > > >> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra < >> > > >> [email protected]> >> > > >> >> > >> wrote: >> > > >> >> > >> >> > > >> >> > >> > The log file : >> hadoop-mithila-datanode-node19.log.2009-04-14 >> > > has >> > > >> >> the >> > > >> >> > >> > following in it: >> > > >> >> > >> > >> > > >> >> > >> > 2009-04-14 10:08:11,499 INFO >> org.apache.hadoop.dfs.DataNode: >> > > >> >> > >> STARTUP_MSG: >> > > >> >> > >> > >> > /************************************************************ >> > > >> >> > >> > STARTUP_MSG: Starting DataNode >> > > >> >> > >> > STARTUP_MSG: host = node19/127.0.0.1 >> > > >> >> > >> > STARTUP_MSG: args = [] >> > > >> >> > >> > STARTUP_MSG: version = 0.18.3 >> > > >> >> > >> > STARTUP_MSG: build = >> > > >> >> > >> > >> > > >> >> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r >> > > >> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC >> 2009 >> > > >> >> > >> > >> > ************************************************************/ >> > > >> >> > >> > 2009-04-14 10:08:12,915 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:13,925 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:14,935 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:15,945 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:16,955 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:17,965 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:18,975 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:19,985 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:20,995 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:22,005 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: >> > Server >> > > >> at >> > > >> >> > >> node18/ >> > > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz... >> > > >> >> > >> > 2009-04-14 10:08:24,025 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:25,035 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:26,045 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:27,055 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:28,065 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:29,075 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:30,085 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:31,095 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:32,105 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:33,115 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: >> > Server >> > > >> at >> > > >> >> > >> node18/ >> > > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz... >> > > >> >> > >> > 2009-04-14 10:08:35,135 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:36,145 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 >> > > time(s). >> > > >> >> > >> > 2009-04-14 10:08:37,155 INFO >> org.apache.hadoop.ipc.Client: >> > > >> Retrying >> > > >> >> > >> connect >> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 >> > > time(s). >> > > >> >> > >> > >> > > >> >> > >> > >> > > >> >> > >> > Hmmm I still cant figure it out.. >> > > >> >> > >> > >> > > >> >> > >> > Mithila >> > > >> >> > >> > >> > > >> >> > >> > >> > > >> >> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra < >> > > >> >> [email protected] >> > > >> >> > >> >wrote: >> > > >> >> > >> > >> > > >> >> > >> >> Also, Would the way the port is accessed change if all >> > these >> > > >> node >> > > >> >> are >> > > >> >> > >> >> connected through a gateway? I mean in the >> hadoop-site.xml >> > > >> file? >> > > >> >> The >> > > >> >> > >> Ubuntu >> > > >> >> > >> >> systems we worked with earlier didnt have a gateway. >> > > >> >> > >> >> Mithila >> > > >> >> > >> >> >> > > >> >> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra < >> > > >> >> [email protected] >> > > >> >> > >> >wrote: >> > > >> >> > >> >> >> > > >> >> > >> >>> Aaron: Which log file do I look into - there are alot >> of >> > > them. >> > > >> >> Here >> > > >> >> > s >> > > >> >> > >> >>> what the error looks like: >> > > >> >> > >> >>> [mith...@node19:~]$ cd hadoop >> > > >> >> > >> >>> [mith...@node19:~/hadoop]$ bin/hadoop dfs -ls >> > > >> >> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to >> > > server: >> > > >> >> > node18/ >> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 0 time(s). >> > > >> >> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to >> > > server: >> > > >> >> > node18/ >> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 1 time(s). >> > > >> >> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to >> > > server: >> > > >> >> > node18/ >> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 2 time(s). >> > > >> >> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to >> > > server: >> > > >> >> > node18/ >> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 3 time(s). >> > > >> >> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to >> > > server: >> > > >> >> > node18/ >> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 4 time(s). >> > > >> >> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to >> > > server: >> > > >> >> > node18/ >> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 5 time(s). >> > > >> >> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to >> > > server: >> > > >> >> > node18/ >> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 6 time(s). >> > > >> >> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to >> > > server: >> > > >> >> > node18/ >> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 7 time(s). >> > > >> >> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to >> > > server: >> > > >> >> > node18/ >> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 8 time(s). >> > > >> >> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to >> > > server: >> > > >> >> > node18/ >> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 9 time(s). >> > > >> >> > >> >>> Bad connection to FS. command aborted. >> > > >> >> > >> >>> >> > > >> >> > >> >>> Node19 is a slave and Node18 is the master. >> > > >> >> > >> >>> >> > > >> >> > >> >>> Mithila >> > > >> >> > >> >>> >> > > >> >> > >> >>> >> > > >> >> > >> >>> >> > > >> >> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball < >> > > >> >> [email protected] >> > > >> >> > >> >wrote: >> > > >> >> > >> >>> >> > > >> >> > >> >>>> Are there any error messages in the log files on those >> > > nodes? >> > > >> >> > >> >>>> - Aaron >> > > >> >> > >> >>>> >> > > >> >> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra < >> > > >> >> > [email protected]> >> > > >> >> > >> >>>> wrote: >> > > >> >> > >> >>>> >> > > >> >> > >> >>>> > I ve drawn a blank here! Can't figure out what s >> wrong >> > > with >> > > >> >> the >> > > >> >> > >> ports. >> > > >> >> > >> >>>> I >> > > >> >> > >> >>>> > can >> > > >> >> > >> >>>> > ssh between the nodes but cant access the DFS from >> the >> > > >> slaves >> > > >> >> - >> > > >> >> > >> says >> > > >> >> > >> >>>> "Bad >> > > >> >> > >> >>>> > connection to DFS". Master seems to be fine. >> > > >> >> > >> >>>> > Mithila >> > > >> >> > >> >>>> > >> > > >> >> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra < >> > > >> >> > >> [email protected]> >> > > >> >> > >> >>>> > wrote: >> > > >> >> > >> >>>> > >> > > >> >> > >> >>>> > > Yes I can.. >> > > >> >> > >> >>>> > > >> > > >> >> > >> >>>> > > >> > > >> >> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky < >> > > >> >> > >> [email protected] >> > > >> >> > >> >>>> > >wrote: >> > > >> >> > >> >>>> > > >> > > >> >> > >> >>>> > >> Can you ssh between the nodes? >> > > >> >> > >> >>>> > >> >> > > >> >> > >> >>>> > >> -jim >> > > >> >> > >> >>>> > >> >> > > >> >> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra >> < >> > > >> >> > >> >>>> [email protected]> >> > > >> >> > >> >>>> > >> wrote: >> > > >> >> > >> >>>> > >> >> > > >> >> > >> >>>> > >> > Thanks Aaron. >> > > >> >> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu >> running >> > > on >> > > >> >> them >> > > >> >> > and >> > > >> >> > >> >>>> the dfs >> > > >> >> > >> >>>> > >> was >> > > >> >> > >> >>>> > >> > accessed at port 54310. The new cluster which I >> ve >> > > >> setup >> > > >> >> has >> > > >> >> > >> Red >> > > >> >> > >> >>>> Hat >> > > >> >> > >> >>>> > >> Linux >> > > >> >> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I >> try >> > to >> > > >> >> access >> > > >> >> > >> the >> > > >> >> > >> >>>> dfs >> > > >> >> > >> >>>> > from >> > > >> >> > >> >>>> > >> > one >> > > >> >> > >> >>>> > >> > of the slaves i get the following response: dfs >> > > cannot >> > > >> be >> > > >> >> > >> >>>> accessed. >> > > >> >> > >> >>>> > When >> > > >> >> > >> >>>> > >> I >> > > >> >> > >> >>>> > >> > access the DFS throught the master there s no >> > > problem. >> > > >> So >> > > >> >> I >> > > >> >> > >> feel >> > > >> >> > >> >>>> there >> > > >> >> > >> >>>> > a >> > > >> >> > >> >>>> > >> > problem with the port. Any ideas? I did check >> the >> > > list >> > > >> of >> > > >> >> > >> slaves, >> > > >> >> > >> >>>> it >> > > >> >> > >> >>>> > >> looks >> > > >> >> > >> >>>> > >> > fine to me. >> > > >> >> > >> >>>> > >> > >> > > >> >> > >> >>>> > >> > Mithila >> > > >> >> > >> >>>> > >> > >> > > >> >> > >> >>>> > >> > >> > > >> >> > >> >>>> > >> > >> > > >> >> > >> >>>> > >> > >> > > >> >> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky < >> > > >> >> > >> >>>> [email protected]> >> > > >> >> > >> >>>> > >> > wrote: >> > > >> >> > >> >>>> > >> > >> > > >> >> > >> >>>> > >> > > Mithila, >> > > >> >> > >> >>>> > >> > > >> > > >> >> > >> >>>> > >> > > You said all the slaves were being utilized >> in >> > the >> > > 3 >> > > >> >> node >> > > >> >> > >> >>>> cluster. >> > > >> >> > >> >>>> > >> Which >> > > >> >> > >> >>>> > >> > > application did you run to test that and what >> > was >> > > >> your >> > > >> >> > input >> > > >> >> > >> >>>> size? >> > > >> >> > >> >>>> > If >> > > >> >> > >> >>>> > >> you >> > > >> >> > >> >>>> > >> > > tried the word count application on a 516 MB >> > input >> > > >> file >> > > >> >> on >> > > >> >> > >> both >> > > >> >> > >> >>>> > >> cluster >> > > >> >> > >> >>>> > >> > > setups, than some of your nodes in the 15 >> node >> > > >> cluster >> > > >> >> may >> > > >> >> > >> not >> > > >> >> > >> >>>> be >> > > >> >> > >> >>>> > >> running >> > > >> >> > >> >>>> > >> > > at >> > > >> >> > >> >>>> > >> > > all. Generally, one map job is assigned to >> each >> > > >> input >> > > >> >> > split >> > > >> >> > >> and >> > > >> >> > >> >>>> if >> > > >> >> > >> >>>> > you >> > > >> >> > >> >>>> > >> > are >> > > >> >> > >> >>>> > >> > > running your cluster with the defaults, the >> > splits >> > > >> are >> > > >> >> 64 >> > > >> >> > MB >> > > >> >> > >> >>>> each. I >> > > >> >> > >> >>>> > >> got >> > > >> >> > >> >>>> > >> > > confused when you said the Namenode seemed to >> do >> > > all >> > > >> >> the >> > > >> >> > >> work. >> > > >> >> > >> >>>> Can >> > > >> >> > >> >>>> > you >> > > >> >> > >> >>>> > >> > > check >> > > >> >> > >> >>>> > >> > > conf/slaves and make sure you put the names >> of >> > all >> > > >> task >> > > >> >> > >> >>>> trackers >> > > >> >> > >> >>>> > >> there? I >> > > >> >> > >> >>>> > >> > > also suggest comparing both clusters with a >> > larger >> > > >> >> input >> > > >> >> > >> size, >> > > >> >> > >> >>>> say >> > > >> >> > >> >>>> > at >> > > >> >> > >> >>>> > >> > least >> > > >> >> > >> >>>> > >> > > 5 GB, to really see a difference. >> > > >> >> > >> >>>> > >> > > >> > > >> >> > >> >>>> > >> > > Jim >> > > >> >> > >> >>>> > >> > > >> > > >> >> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron >> Kimball < >> > > >> >> > >> >>>> [email protected]> >> > > >> >> > >> >>>> > >> > wrote: >> > > >> >> > >> >>>> > >> > > >> > > >> >> > >> >>>> > >> > > > in hadoop-*-examples.jar, use >> "randomwriter" >> > to >> > > >> >> generate >> > > >> >> > >> the >> > > >> >> > >> >>>> data >> > > >> >> > >> >>>> > >> and >> > > >> >> > >> >>>> > >> > > > "sort" >> > > >> >> > >> >>>> > >> > > > to sort it. >> > > >> >> > >> >>>> > >> > > > - Aaron >> > > >> >> > >> >>>> > >> > > > >> > > >> >> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil >> Doshi >> > < >> > > >> >> > >> >>>> > [email protected]> >> > > >> >> > >> >>>> > >> > > wrote: >> > > >> >> > >> >>>> > >> > > > >> > > >> >> > >> >>>> > >> > > > > Your data is too small I guess for 15 >> > clusters >> > > >> ..So >> > > >> >> it >> > > >> >> > >> >>>> might be >> > > >> >> > >> >>>> > >> > > overhead >> > > >> >> > >> >>>> > >> > > > > time of these clusters making your total >> MR >> > > jobs >> > > >> >> more >> > > >> >> > >> time >> > > >> >> > >> >>>> > >> consuming. >> > > >> >> > >> >>>> > >> > > > > I guess you will have to try with larger >> set >> > > of >> > > >> >> data.. >> > > >> >> > >> >>>> > >> > > > > >> > > >> >> > >> >>>> > >> > > > > Pankil >> > > >> >> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila >> > > >> Nagendra < >> > > >> >> > >> >>>> > >> [email protected]> >> > > >> >> > >> >>>> > >> > > > > wrote: >> > > >> >> > >> >>>> > >> > > > > >> > > >> >> > >> >>>> > >> > > > > > Aaron >> > > >> >> > >> >>>> > >> > > > > > >> > > >> >> > >> >>>> > >> > > > > > That could be the issue, my data is >> just >> > > 516MB >> > > >> - >> > > >> >> > >> wouldn't >> > > >> >> > >> >>>> this >> > > >> >> > >> >>>> > >> see >> > > >> >> > >> >>>> > >> > a >> > > >> >> > >> >>>> > >> > > > bit >> > > >> >> > >> >>>> > >> > > > > of >> > > >> >> > >> >>>> > >> > > > > > speed up? >> > > >> >> > >> >>>> > >> > > > > > Could you guide me to the example? I ll >> > run >> > > my >> > > >> >> > cluster >> > > >> >> > >> on >> > > >> >> > >> >>>> it >> > > >> >> > >> >>>> > and >> > > >> >> > >> >>>> > >> > see >> > > >> >> > >> >>>> > >> > > > what >> > > >> >> > >> >>>> > >> > > > > I >> > > >> >> > >> >>>> > >> > > > > > get. Also for my program I had a java >> > timer >> > > >> >> running >> > > >> >> > to >> > > >> >> > >> >>>> record >> > > >> >> > >> >>>> > >> the >> > > >> >> > >> >>>> > >> > > time >> > > >> >> > >> >>>> > >> > > > > > taken >> > > >> >> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have >> an >> > > >> >> inbuilt >> > > >> >> > >> timer? >> > > >> >> > >> >>>> > >> > > > > > >> > > >> >> > >> >>>> > >> > > > > > Mithila >> > > >> >> > >> >>>> > >> > > > > > >> > > >> >> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron >> > > Kimball >> > > >> < >> > > >> >> > >> >>>> > >> [email protected] >> > > >> >> > >> >>>> > >> > > >> > > >> >> > >> >>>> > >> > > > > wrote: >> > > >> >> > >> >>>> > >> > > > > > >> > > >> >> > >> >>>> > >> > > > > > > Virtually none of the examples that >> ship >> > > >> with >> > > >> >> > Hadoop >> > > >> >> > >> >>>> are >> > > >> >> > >> >>>> > >> designed >> > > >> >> > >> >>>> > >> > > to >> > > >> >> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup >> > comes >> > > >> from >> > > >> >> > its >> > > >> >> > >> >>>> ability >> > > >> >> > >> >>>> > to >> > > >> >> > >> >>>> > >> > > > process >> > > >> >> > >> >>>> > >> > > > > > very >> > > >> >> > >> >>>> > >> > > > > > > large volumes of data (starting >> around, >> > > say, >> > > >> >> tens >> > > >> >> > of >> > > >> >> > >> GB >> > > >> >> > >> >>>> per >> > > >> >> > >> >>>> > >> job, >> > > >> >> > >> >>>> > >> > > and >> > > >> >> > >> >>>> > >> > > > > > going >> > > >> >> > >> >>>> > >> > > > > > > up in orders of magnitude from >> there). >> > So >> > > if >> > > >> >> you >> > > >> >> > are >> > > >> >> > >> >>>> timing >> > > >> >> > >> >>>> > >> the >> > > >> >> > >> >>>> > >> > pi >> > > >> >> > >> >>>> > >> > > > > > > calculator (or something like that), >> its >> > > >> >> results >> > > >> >> > >> won't >> > > >> >> > >> >>>> > >> > necessarily >> > > >> >> > >> >>>> > >> > > be >> > > >> >> > >> >>>> > >> > > > > > very >> > > >> >> > >> >>>> > >> > > > > > > consistent. If a job doesn't have >> enough >> > > >> >> fragments >> > > >> >> > >> of >> > > >> >> > >> >>>> data >> > > >> >> > >> >>>> > to >> > > >> >> > >> >>>> > >> > > > allocate >> > > >> >> > >> >>>> > >> > > > > > one >> > > >> >> > >> >>>> > >> > > > > > > per each node, some of the nodes will >> > also >> > > >> just >> > > >> >> go >> > > >> >> > >> >>>> unused. >> > > >> >> > >> >>>> > >> > > > > > > >> > > >> >> > >> >>>> > >> > > > > > > The best example for you to run is to >> > use >> > > >> >> > >> randomwriter >> > > >> >> > >> >>>> to >> > > >> >> > >> >>>> > fill >> > > >> >> > >> >>>> > >> up >> > > >> >> > >> >>>> > >> > > > your >> > > >> >> > >> >>>> > >> > > > > > > cluster with several GB of random >> data >> > and >> > > >> then >> > > >> >> > run >> > > >> >> > >> the >> > > >> >> > >> >>>> sort >> > > >> >> > >> >>>> > >> > > program. >> > > >> >> > >> >>>> > >> > > > > If >> > > >> >> > >> >>>> > >> > > > > > > that doesn't scale up performance >> from 3 >> > > >> nodes >> > > >> >> to >> > > >> >> > >> 15, >> > > >> >> > >> >>>> then >> > > >> >> > >> >>>> > >> you've >> > > >> >> > >> >>>> > >> > > > > > > definitely >> > > >> >> > >> >>>> > >> > > > > > > got something strange going on. >> > > >> >> > >> >>>> > >> > > > > > > >> > > >> >> > >> >>>> > >> > > > > > > - Aaron >> > > >> >> > >> >>>> > >> > > > > > > >> > > >> >> > >> >>>> > >> > > > > > > >> > > >> >> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, >> Mithila >> > > >> >> Nagendra >> > > >> >> > < >> > > >> >> > >> >>>> > >> > > [email protected]> >> > > >> >> > >> >>>> > >> > > > > > > wrote: >> > > >> >> > >> >>>> > >> > > > > > > >> > > >> >> > >> >>>> > >> > > > > > > > Hey all >> > > >> >> > >> >>>> > >> > > > > > > > I recently setup a three node >> hadoop >> > > >> cluster >> > > >> >> and >> > > >> >> > >> ran >> > > >> >> > >> >>>> an >> > > >> >> > >> >>>> > >> > examples >> > > >> >> > >> >>>> > >> > > on >> > > >> >> > >> >>>> > >> > > > > it. >> > > >> >> > >> >>>> > >> > > > > > > It >> > > >> >> > >> >>>> > >> > > > > > > > was pretty fast, and all the three >> > nodes >> > > >> were >> > > >> >> > >> being >> > > >> >> > >> >>>> used >> > > >> >> > >> >>>> > (I >> > > >> >> > >> >>>> > >> > > checked >> > > >> >> > >> >>>> > >> > > > > the >> > > >> >> > >> >>>> > >> > > > > > > log >> > > >> >> > >> >>>> > >> > > > > > > > files to make sure that the slaves >> are >> > > >> >> > utilized). >> > > >> >> > >> >>>> > >> > > > > > > > >> > > >> >> > >> >>>> > >> > > > > > > > Now I ve setup another cluster >> > > consisting >> > > >> of >> > > >> >> 15 >> > > >> >> > >> >>>> nodes. I >> > > >> >> > >> >>>> > ran >> > > >> >> > >> >>>> > >> > the >> > > >> >> > >> >>>> > >> > > > same >> > > >> >> > >> >>>> > >> > > > > > > > example, but instead of speeding >> up, >> > the >> > > >> >> > >> map-reduce >> > > >> >> > >> >>>> task >> > > >> >> > >> >>>> > >> seems >> > > >> >> > >> >>>> > >> > to >> > > >> >> > >> >>>> > >> > > > > take >> > > >> >> > >> >>>> > >> > > > > > > > forever! The slaves are not being >> used >> > > for >> > > >> >> some >> > > >> >> > >> >>>> reason. >> > > >> >> > >> >>>> > This >> > > >> >> > >> >>>> > >> > > second >> > > >> >> > >> >>>> > >> > > > > > > cluster >> > > >> >> > >> >>>> > >> > > > > > > > has a lower, per node processing >> > power, >> > > >> but >> > > >> >> > should >> > > >> >> > >> >>>> that >> > > >> >> > >> >>>> > make >> > > >> >> > >> >>>> > >> > any >> > > >> >> > >> >>>> > >> > > > > > > > difference? >> > > >> >> > >> >>>> > >> > > > > > > > How can I ensure that the data is >> > being >> > > >> >> mapped >> > > >> >> > to >> > > >> >> > >> all >> > > >> >> > >> >>>> the >> > > >> >> > >> >>>> > >> > nodes? >> > > >> >> > >> >>>> > >> > > > > > > Presently, >> > > >> >> > >> >>>> > >> > > > > > > > the only node that seems to be >> doing >> > all >> > > >> the >> > > >> >> > work >> > > >> >> > >> is >> > > >> >> > >> >>>> the >> > > >> >> > >> >>>> > >> Master >> > > >> >> > >> >>>> > >> > > > node. >> > > >> >> > >> >>>> > >> > > > > > > > >> > > >> >> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase >> > the >> > > >> >> network >> > > >> >> > >> cost? >> > > >> >> > >> >>>> What >> > > >> >> > >> >>>> > >> can >> > > >> >> > >> >>>> > >> > I >> > > >> >> > >> >>>> > >> > > do >> > > >> >> > >> >>>> > >> > > > > to >> > > >> >> > >> >>>> > >> > > > > > > > setup >> > > >> >> > >> >>>> > >> > > > > > > > the cluster to function more >> > > efficiently? >> > > >> >> > >> >>>> > >> > > > > > > > >> > > >> >> > >> >>>> > >> > > > > > > > Thanks! >> > > >> >> > >> >>>> > >> > > > > > > > Mithila Nagendra >> > > >> >> > >> >>>> > >> > > > > > > > Arizona State University >> > > >> >> > >> >>>> > >> > > > > > > > >> > > >> >> > >> >>>> > >> > > > > > > >> > > >> >> > >> >>>> > >> > > > > > >> > > >> >> > >> >>>> > >> > > > > >> > > >> >> > >> >>>> > >> > > > >> > > >> >> > >> >>>> > >> > > >> > > >> >> > >> >>>> > >> > >> > > >> >> > >> >>>> > >> >> > > >> >> > >> >>>> > > >> > > >> >> > >> >>>> > > >> > > >> >> > >> >>>> > >> > > >> >> > >> >>>> >> > > >> >> > >> >>> >> > > >> >> > >> >>> >> > > >> >> > >> >> >> > > >> >> > >> > >> > > >> >> > >> >> > > >> >> > >> >> > > >> >> > >> Ravi >> > > >> >> > >> -- >> > > >> >> > >> >> > > >> >> > >> >> > > >> >> > > >> > > >> >> > >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> -- >> > > >> >> Alpha Chapters of my book on Hadoop are available >> > > >> >> http://www.apress.com/book/view/9781430219422 >> > > >> >> >> > > >> > >> > > >> > >> > > >> >> > > > >> > > > >> > > > >> > > > -- >> > > > Alpha Chapters of my book on Hadoop are available >> > > > http://www.apress.com/book/view/9781430219422 >> > > > >> > > >> > > >> > > >> > > -- >> > > Alpha Chapters of my book on Hadoop are available >> > > http://www.apress.com/book/view/9781430219422 >> > > >> > >> >> >> >> -- >> Alpha Chapters of my book on Hadoop are available >> http://www.apress.com/book/view/9781430219422 >> > >
