Thanks Jason! This helps a lot. I m planning to talk to my network admin tomorrow. I hoping he ll be able to fix this problem. Mithila
On Fri, Apr 17, 2009 at 9:00 AM, jason hadoop <[email protected]>wrote: > Assuming you are on a linux box, on both machines > verify that the servers are listening on the ports you expect via > netstat -a -n -t -p > -a show sockets accepting connections > -n do not translate ip addresses to host names > -t only list tcp sockets > -p list the pid/process name > > on the machine 192.168.0.18 > you should have sockets bound to 0.0.0.0:54310 with a process of java, and > the pid should be the pid of your namenode process. > > On the remote machine you should be able to *telnet 192.168.0.18 54310* and > have it connect > *Connected to 192.168.0.18. > Escape character is '^]'. > * > > If the netstat shows the socket accepting and the telnet does not connect, > then something is blocking the TCP packets between the machines. one or > both > machines has a firewall, an intervening router has a firewall, or there is > some routing problem > the command /sbin/iptables -L will normally list the firewall rules, if any > for a linux machine. > > > You should be able to use telnet to verify that you can connect from the > remote machine. > > On Thu, Apr 16, 2009 at 9:18 PM, Mithila Nagendra <[email protected]> > wrote: > > > Thanks! I ll see what I can find out. > > > > On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop <[email protected] > > >wrote: > > > > > The firewall was run at system startup, I think there was a > > > /etc/sysconfig/iptables file present which triggered the firewall. > > > I don't currently have access to any centos 5 machines so I can't > easily > > > check. > > > > > > > > > > > > On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop <[email protected] > > > >wrote: > > > > > > > The kickstart script was something that the operations staff was > using > > to > > > > initialize new machines, I never actually saw the script, just > figured > > > out > > > > that there was a firewall in place. > > > > > > > > > > > > > > > > On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra <[email protected] > > > >wrote: > > > > > > > >> Jason: the kickstart script - was it something you wrote or is it > run > > > when > > > >> the system turns on? > > > >> Mithila > > > >> > > > >> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra <[email protected] > > > > > >> wrote: > > > >> > > > >> > Thanks Jason! Will check that out. > > > >> > Mithila > > > >> > > > > >> > > > > >> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop < > > [email protected] > > > >> >wrote: > > > >> > > > > >> >> Double check that there is no firewall in place. > > > >> >> At one point a bunch of new machines were kickstarted and placed > in > > a > > > >> >> cluster and they all failed with something similar. > > > >> >> It turned out the kickstart script turned enabled the firewall > with > > a > > > >> rule > > > >> >> that blocked ports in the 50k range. > > > >> >> It took us a while to even think to check that was not a part of > > our > > > >> >> normal > > > >> >> machine configuration > > > >> >> > > > >> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra < > > [email protected] > > > > > > > >> >> wrote: > > > >> >> > > > >> >> > Hi Aaron > > > >> >> > I will look into that thanks! > > > >> >> > > > > >> >> > I spoke to the admin who overlooks the cluster. He said that > the > > > >> gateway > > > >> >> > comes in to the picture only when one of the nodes communicates > > > with > > > >> a > > > >> >> node > > > >> >> > outside of the cluster. But in my case the communication is > > carried > > > >> out > > > >> >> > between the nodes which all belong to the same cluster. > > > >> >> > > > > >> >> > Mithila > > > >> >> > > > > >> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball < > > [email protected] > > > > > > > >> >> wrote: > > > >> >> > > > > >> >> > > Hi, > > > >> >> > > > > > >> >> > > I wrote a blog post a while back about connecting nodes via a > > > >> gateway. > > > >> >> > See > > > >> >> > > > > > >> >> > > > > >> >> > > > >> > > > > > > http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/ > > > >> >> > > > > > >> >> > > This assumes that the client is outside the gateway and all > > > >> >> > > datanodes/namenode are inside, but the same principles apply. > > > >> You'll > > > >> >> just > > > >> >> > > need to set up ssh tunnels from every datanode to the > namenode. > > > >> >> > > > > > >> >> > > - Aaron > > > >> >> > > > > > >> >> > > > > > >> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari < > > > >> >> [email protected] > > > >> >> > >wrote: > > > >> >> > > > > > >> >> > >> Looks like your NameNode is down . > > > >> >> > >> Verify if hadoop process are running ( jps should show you > > all > > > >> java > > > >> >> > >> running process). > > > >> >> > >> If your hadoop process are running try restarting your > hadoop > > > >> process > > > >> >> . > > > >> >> > >> I guess this problem is due to your fsimage not being > correct > > . > > > >> >> > >> You might have to format your namenode. > > > >> >> > >> Hope this helps. > > > >> >> > >> > > > >> >> > >> Thanks, > > > >> >> > >> -- > > > >> >> > >> Ravi > > > >> >> > >> > > > >> >> > >> > > > >> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <[email protected]> > > > wrote: > > > >> >> > >> > > > >> >> > >> The log file runs into thousands of line with the same > message > > > >> being > > > >> >> > >> displayed every time. > > > >> >> > >> > > > >> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra < > > > >> [email protected]> > > > >> >> > >> wrote: > > > >> >> > >> > > > >> >> > >> > The log file : > hadoop-mithila-datanode-node19.log.2009-04-14 > > > has > > > >> >> the > > > >> >> > >> > following in it: > > > >> >> > >> > > > > >> >> > >> > 2009-04-14 10:08:11,499 INFO > org.apache.hadoop.dfs.DataNode: > > > >> >> > >> STARTUP_MSG: > > > >> >> > >> > > > /************************************************************ > > > >> >> > >> > STARTUP_MSG: Starting DataNode > > > >> >> > >> > STARTUP_MSG: host = node19/127.0.0.1 > > > >> >> > >> > STARTUP_MSG: args = [] > > > >> >> > >> > STARTUP_MSG: version = 0.18.3 > > > >> >> > >> > STARTUP_MSG: build = > > > >> >> > >> > > > > >> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r > > > >> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC > 2009 > > > >> >> > >> > > > ************************************************************/ > > > >> >> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC: > > Server > > > >> at > > > >> >> > >> node18/ > > > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz... > > > >> >> > >> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC: > > Server > > > >> at > > > >> >> > >> node18/ > > > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz... > > > >> >> > >> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1 > > > time(s). > > > >> >> > >> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client: > > > >> Retrying > > > >> >> > >> connect > > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2 > > > time(s). > > > >> >> > >> > > > > >> >> > >> > > > > >> >> > >> > Hmmm I still cant figure it out.. > > > >> >> > >> > > > > >> >> > >> > Mithila > > > >> >> > >> > > > > >> >> > >> > > > > >> >> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra < > > > >> >> [email protected] > > > >> >> > >> >wrote: > > > >> >> > >> > > > > >> >> > >> >> Also, Would the way the port is accessed change if all > > these > > > >> node > > > >> >> are > > > >> >> > >> >> connected through a gateway? I mean in the > hadoop-site.xml > > > >> file? > > > >> >> The > > > >> >> > >> Ubuntu > > > >> >> > >> >> systems we worked with earlier didnt have a gateway. > > > >> >> > >> >> Mithila > > > >> >> > >> >> > > > >> >> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra < > > > >> >> [email protected] > > > >> >> > >> >wrote: > > > >> >> > >> >> > > > >> >> > >> >>> Aaron: Which log file do I look into - there are alot of > > > them. > > > >> >> Here > > > >> >> > s > > > >> >> > >> >>> what the error looks like: > > > >> >> > >> >>> [mith...@node19:~]$ cd hadoop > > > >> >> > >> >>> [mith...@node19:~/hadoop]$ bin/hadoop dfs -ls > > > >> >> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to > > > server: > > > >> >> > node18/ > > > >> >> > >> >>> 192.168.0.18:54310. Already tried 0 time(s). > > > >> >> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to > > > server: > > > >> >> > node18/ > > > >> >> > >> >>> 192.168.0.18:54310. Already tried 1 time(s). > > > >> >> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to > > > server: > > > >> >> > node18/ > > > >> >> > >> >>> 192.168.0.18:54310. Already tried 2 time(s). > > > >> >> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to > > > server: > > > >> >> > node18/ > > > >> >> > >> >>> 192.168.0.18:54310. Already tried 3 time(s). > > > >> >> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to > > > server: > > > >> >> > node18/ > > > >> >> > >> >>> 192.168.0.18:54310. Already tried 4 time(s). > > > >> >> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to > > > server: > > > >> >> > node18/ > > > >> >> > >> >>> 192.168.0.18:54310. Already tried 5 time(s). > > > >> >> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to > > > server: > > > >> >> > node18/ > > > >> >> > >> >>> 192.168.0.18:54310. Already tried 6 time(s). > > > >> >> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to > > > server: > > > >> >> > node18/ > > > >> >> > >> >>> 192.168.0.18:54310. Already tried 7 time(s). > > > >> >> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to > > > server: > > > >> >> > node18/ > > > >> >> > >> >>> 192.168.0.18:54310. Already tried 8 time(s). > > > >> >> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to > > > server: > > > >> >> > node18/ > > > >> >> > >> >>> 192.168.0.18:54310. Already tried 9 time(s). > > > >> >> > >> >>> Bad connection to FS. command aborted. > > > >> >> > >> >>> > > > >> >> > >> >>> Node19 is a slave and Node18 is the master. > > > >> >> > >> >>> > > > >> >> > >> >>> Mithila > > > >> >> > >> >>> > > > >> >> > >> >>> > > > >> >> > >> >>> > > > >> >> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball < > > > >> >> [email protected] > > > >> >> > >> >wrote: > > > >> >> > >> >>> > > > >> >> > >> >>>> Are there any error messages in the log files on those > > > nodes? > > > >> >> > >> >>>> - Aaron > > > >> >> > >> >>>> > > > >> >> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra < > > > >> >> > [email protected]> > > > >> >> > >> >>>> wrote: > > > >> >> > >> >>>> > > > >> >> > >> >>>> > I ve drawn a blank here! Can't figure out what s > wrong > > > with > > > >> >> the > > > >> >> > >> ports. > > > >> >> > >> >>>> I > > > >> >> > >> >>>> > can > > > >> >> > >> >>>> > ssh between the nodes but cant access the DFS from > the > > > >> slaves > > > >> >> - > > > >> >> > >> says > > > >> >> > >> >>>> "Bad > > > >> >> > >> >>>> > connection to DFS". Master seems to be fine. > > > >> >> > >> >>>> > Mithila > > > >> >> > >> >>>> > > > > >> >> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra < > > > >> >> > >> [email protected]> > > > >> >> > >> >>>> > wrote: > > > >> >> > >> >>>> > > > > >> >> > >> >>>> > > Yes I can.. > > > >> >> > >> >>>> > > > > > >> >> > >> >>>> > > > > > >> >> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky < > > > >> >> > >> [email protected] > > > >> >> > >> >>>> > >wrote: > > > >> >> > >> >>>> > > > > > >> >> > >> >>>> > >> Can you ssh between the nodes? > > > >> >> > >> >>>> > >> > > > >> >> > >> >>>> > >> -jim > > > >> >> > >> >>>> > >> > > > >> >> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra > < > > > >> >> > >> >>>> [email protected]> > > > >> >> > >> >>>> > >> wrote: > > > >> >> > >> >>>> > >> > > > >> >> > >> >>>> > >> > Thanks Aaron. > > > >> >> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu > running > > > on > > > >> >> them > > > >> >> > and > > > >> >> > >> >>>> the dfs > > > >> >> > >> >>>> > >> was > > > >> >> > >> >>>> > >> > accessed at port 54310. The new cluster which I > ve > > > >> setup > > > >> >> has > > > >> >> > >> Red > > > >> >> > >> >>>> Hat > > > >> >> > >> >>>> > >> Linux > > > >> >> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I > try > > to > > > >> >> access > > > >> >> > >> the > > > >> >> > >> >>>> dfs > > > >> >> > >> >>>> > from > > > >> >> > >> >>>> > >> > one > > > >> >> > >> >>>> > >> > of the slaves i get the following response: dfs > > > cannot > > > >> be > > > >> >> > >> >>>> accessed. > > > >> >> > >> >>>> > When > > > >> >> > >> >>>> > >> I > > > >> >> > >> >>>> > >> > access the DFS throught the master there s no > > > problem. > > > >> So > > > >> >> I > > > >> >> > >> feel > > > >> >> > >> >>>> there > > > >> >> > >> >>>> > a > > > >> >> > >> >>>> > >> > problem with the port. Any ideas? I did check > the > > > list > > > >> of > > > >> >> > >> slaves, > > > >> >> > >> >>>> it > > > >> >> > >> >>>> > >> looks > > > >> >> > >> >>>> > >> > fine to me. > > > >> >> > >> >>>> > >> > > > > >> >> > >> >>>> > >> > Mithila > > > >> >> > >> >>>> > >> > > > > >> >> > >> >>>> > >> > > > > >> >> > >> >>>> > >> > > > > >> >> > >> >>>> > >> > > > > >> >> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky < > > > >> >> > >> >>>> [email protected]> > > > >> >> > >> >>>> > >> > wrote: > > > >> >> > >> >>>> > >> > > > > >> >> > >> >>>> > >> > > Mithila, > > > >> >> > >> >>>> > >> > > > > > >> >> > >> >>>> > >> > > You said all the slaves were being utilized in > > the > > > 3 > > > >> >> node > > > >> >> > >> >>>> cluster. > > > >> >> > >> >>>> > >> Which > > > >> >> > >> >>>> > >> > > application did you run to test that and what > > was > > > >> your > > > >> >> > input > > > >> >> > >> >>>> size? > > > >> >> > >> >>>> > If > > > >> >> > >> >>>> > >> you > > > >> >> > >> >>>> > >> > > tried the word count application on a 516 MB > > input > > > >> file > > > >> >> on > > > >> >> > >> both > > > >> >> > >> >>>> > >> cluster > > > >> >> > >> >>>> > >> > > setups, than some of your nodes in the 15 node > > > >> cluster > > > >> >> may > > > >> >> > >> not > > > >> >> > >> >>>> be > > > >> >> > >> >>>> > >> running > > > >> >> > >> >>>> > >> > > at > > > >> >> > >> >>>> > >> > > all. Generally, one map job is assigned to > each > > > >> input > > > >> >> > split > > > >> >> > >> and > > > >> >> > >> >>>> if > > > >> >> > >> >>>> > you > > > >> >> > >> >>>> > >> > are > > > >> >> > >> >>>> > >> > > running your cluster with the defaults, the > > splits > > > >> are > > > >> >> 64 > > > >> >> > MB > > > >> >> > >> >>>> each. I > > > >> >> > >> >>>> > >> got > > > >> >> > >> >>>> > >> > > confused when you said the Namenode seemed to > do > > > all > > > >> >> the > > > >> >> > >> work. > > > >> >> > >> >>>> Can > > > >> >> > >> >>>> > you > > > >> >> > >> >>>> > >> > > check > > > >> >> > >> >>>> > >> > > conf/slaves and make sure you put the names of > > all > > > >> task > > > >> >> > >> >>>> trackers > > > >> >> > >> >>>> > >> there? I > > > >> >> > >> >>>> > >> > > also suggest comparing both clusters with a > > larger > > > >> >> input > > > >> >> > >> size, > > > >> >> > >> >>>> say > > > >> >> > >> >>>> > at > > > >> >> > >> >>>> > >> > least > > > >> >> > >> >>>> > >> > > 5 GB, to really see a difference. > > > >> >> > >> >>>> > >> > > > > > >> >> > >> >>>> > >> > > Jim > > > >> >> > >> >>>> > >> > > > > > >> >> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball > < > > > >> >> > >> >>>> [email protected]> > > > >> >> > >> >>>> > >> > wrote: > > > >> >> > >> >>>> > >> > > > > > >> >> > >> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" > > to > > > >> >> generate > > > >> >> > >> the > > > >> >> > >> >>>> data > > > >> >> > >> >>>> > >> and > > > >> >> > >> >>>> > >> > > > "sort" > > > >> >> > >> >>>> > >> > > > to sort it. > > > >> >> > >> >>>> > >> > > > - Aaron > > > >> >> > >> >>>> > >> > > > > > > >> >> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil > Doshi > > < > > > >> >> > >> >>>> > [email protected]> > > > >> >> > >> >>>> > >> > > wrote: > > > >> >> > >> >>>> > >> > > > > > > >> >> > >> >>>> > >> > > > > Your data is too small I guess for 15 > > clusters > > > >> ..So > > > >> >> it > > > >> >> > >> >>>> might be > > > >> >> > >> >>>> > >> > > overhead > > > >> >> > >> >>>> > >> > > > > time of these clusters making your total > MR > > > jobs > > > >> >> more > > > >> >> > >> time > > > >> >> > >> >>>> > >> consuming. > > > >> >> > >> >>>> > >> > > > > I guess you will have to try with larger > set > > > of > > > >> >> data.. > > > >> >> > >> >>>> > >> > > > > > > > >> >> > >> >>>> > >> > > > > Pankil > > > >> >> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila > > > >> Nagendra < > > > >> >> > >> >>>> > >> [email protected]> > > > >> >> > >> >>>> > >> > > > > wrote: > > > >> >> > >> >>>> > >> > > > > > > > >> >> > >> >>>> > >> > > > > > Aaron > > > >> >> > >> >>>> > >> > > > > > > > > >> >> > >> >>>> > >> > > > > > That could be the issue, my data is just > > > 516MB > > > >> - > > > >> >> > >> wouldn't > > > >> >> > >> >>>> this > > > >> >> > >> >>>> > >> see > > > >> >> > >> >>>> > >> > a > > > >> >> > >> >>>> > >> > > > bit > > > >> >> > >> >>>> > >> > > > > of > > > >> >> > >> >>>> > >> > > > > > speed up? > > > >> >> > >> >>>> > >> > > > > > Could you guide me to the example? I ll > > run > > > my > > > >> >> > cluster > > > >> >> > >> on > > > >> >> > >> >>>> it > > > >> >> > >> >>>> > and > > > >> >> > >> >>>> > >> > see > > > >> >> > >> >>>> > >> > > > what > > > >> >> > >> >>>> > >> > > > > I > > > >> >> > >> >>>> > >> > > > > > get. Also for my program I had a java > > timer > > > >> >> running > > > >> >> > to > > > >> >> > >> >>>> record > > > >> >> > >> >>>> > >> the > > > >> >> > >> >>>> > >> > > time > > > >> >> > >> >>>> > >> > > > > > taken > > > >> >> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have > an > > > >> >> inbuilt > > > >> >> > >> timer? > > > >> >> > >> >>>> > >> > > > > > > > > >> >> > >> >>>> > >> > > > > > Mithila > > > >> >> > >> >>>> > >> > > > > > > > > >> >> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron > > > Kimball > > > >> < > > > >> >> > >> >>>> > >> [email protected] > > > >> >> > >> >>>> > >> > > > > > >> >> > >> >>>> > >> > > > > wrote: > > > >> >> > >> >>>> > >> > > > > > > > > >> >> > >> >>>> > >> > > > > > > Virtually none of the examples that > ship > > > >> with > > > >> >> > Hadoop > > > >> >> > >> >>>> are > > > >> >> > >> >>>> > >> designed > > > >> >> > >> >>>> > >> > > to > > > >> >> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup > > comes > > > >> from > > > >> >> > its > > > >> >> > >> >>>> ability > > > >> >> > >> >>>> > to > > > >> >> > >> >>>> > >> > > > process > > > >> >> > >> >>>> > >> > > > > > very > > > >> >> > >> >>>> > >> > > > > > > large volumes of data (starting > around, > > > say, > > > >> >> tens > > > >> >> > of > > > >> >> > >> GB > > > >> >> > >> >>>> per > > > >> >> > >> >>>> > >> job, > > > >> >> > >> >>>> > >> > > and > > > >> >> > >> >>>> > >> > > > > > going > > > >> >> > >> >>>> > >> > > > > > > up in orders of magnitude from there). > > So > > > if > > > >> >> you > > > >> >> > are > > > >> >> > >> >>>> timing > > > >> >> > >> >>>> > >> the > > > >> >> > >> >>>> > >> > pi > > > >> >> > >> >>>> > >> > > > > > > calculator (or something like that), > its > > > >> >> results > > > >> >> > >> won't > > > >> >> > >> >>>> > >> > necessarily > > > >> >> > >> >>>> > >> > > be > > > >> >> > >> >>>> > >> > > > > > very > > > >> >> > >> >>>> > >> > > > > > > consistent. If a job doesn't have > enough > > > >> >> fragments > > > >> >> > >> of > > > >> >> > >> >>>> data > > > >> >> > >> >>>> > to > > > >> >> > >> >>>> > >> > > > allocate > > > >> >> > >> >>>> > >> > > > > > one > > > >> >> > >> >>>> > >> > > > > > > per each node, some of the nodes will > > also > > > >> just > > > >> >> go > > > >> >> > >> >>>> unused. > > > >> >> > >> >>>> > >> > > > > > > > > > >> >> > >> >>>> > >> > > > > > > The best example for you to run is to > > use > > > >> >> > >> randomwriter > > > >> >> > >> >>>> to > > > >> >> > >> >>>> > fill > > > >> >> > >> >>>> > >> up > > > >> >> > >> >>>> > >> > > > your > > > >> >> > >> >>>> > >> > > > > > > cluster with several GB of random data > > and > > > >> then > > > >> >> > run > > > >> >> > >> the > > > >> >> > >> >>>> sort > > > >> >> > >> >>>> > >> > > program. > > > >> >> > >> >>>> > >> > > > > If > > > >> >> > >> >>>> > >> > > > > > > that doesn't scale up performance from > 3 > > > >> nodes > > > >> >> to > > > >> >> > >> 15, > > > >> >> > >> >>>> then > > > >> >> > >> >>>> > >> you've > > > >> >> > >> >>>> > >> > > > > > > definitely > > > >> >> > >> >>>> > >> > > > > > > got something strange going on. > > > >> >> > >> >>>> > >> > > > > > > > > > >> >> > >> >>>> > >> > > > > > > - Aaron > > > >> >> > >> >>>> > >> > > > > > > > > > >> >> > >> >>>> > >> > > > > > > > > > >> >> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, > Mithila > > > >> >> Nagendra > > > >> >> > < > > > >> >> > >> >>>> > >> > > [email protected]> > > > >> >> > >> >>>> > >> > > > > > > wrote: > > > >> >> > >> >>>> > >> > > > > > > > > > >> >> > >> >>>> > >> > > > > > > > Hey all > > > >> >> > >> >>>> > >> > > > > > > > I recently setup a three node hadoop > > > >> cluster > > > >> >> and > > > >> >> > >> ran > > > >> >> > >> >>>> an > > > >> >> > >> >>>> > >> > examples > > > >> >> > >> >>>> > >> > > on > > > >> >> > >> >>>> > >> > > > > it. > > > >> >> > >> >>>> > >> > > > > > > It > > > >> >> > >> >>>> > >> > > > > > > > was pretty fast, and all the three > > nodes > > > >> were > > > >> >> > >> being > > > >> >> > >> >>>> used > > > >> >> > >> >>>> > (I > > > >> >> > >> >>>> > >> > > checked > > > >> >> > >> >>>> > >> > > > > the > > > >> >> > >> >>>> > >> > > > > > > log > > > >> >> > >> >>>> > >> > > > > > > > files to make sure that the slaves > are > > > >> >> > utilized). > > > >> >> > >> >>>> > >> > > > > > > > > > > >> >> > >> >>>> > >> > > > > > > > Now I ve setup another cluster > > > consisting > > > >> of > > > >> >> 15 > > > >> >> > >> >>>> nodes. I > > > >> >> > >> >>>> > ran > > > >> >> > >> >>>> > >> > the > > > >> >> > >> >>>> > >> > > > same > > > >> >> > >> >>>> > >> > > > > > > > example, but instead of speeding up, > > the > > > >> >> > >> map-reduce > > > >> >> > >> >>>> task > > > >> >> > >> >>>> > >> seems > > > >> >> > >> >>>> > >> > to > > > >> >> > >> >>>> > >> > > > > take > > > >> >> > >> >>>> > >> > > > > > > > forever! The slaves are not being > used > > > for > > > >> >> some > > > >> >> > >> >>>> reason. > > > >> >> > >> >>>> > This > > > >> >> > >> >>>> > >> > > second > > > >> >> > >> >>>> > >> > > > > > > cluster > > > >> >> > >> >>>> > >> > > > > > > > has a lower, per node processing > > power, > > > >> but > > > >> >> > should > > > >> >> > >> >>>> that > > > >> >> > >> >>>> > make > > > >> >> > >> >>>> > >> > any > > > >> >> > >> >>>> > >> > > > > > > > difference? > > > >> >> > >> >>>> > >> > > > > > > > How can I ensure that the data is > > being > > > >> >> mapped > > > >> >> > to > > > >> >> > >> all > > > >> >> > >> >>>> the > > > >> >> > >> >>>> > >> > nodes? > > > >> >> > >> >>>> > >> > > > > > > Presently, > > > >> >> > >> >>>> > >> > > > > > > > the only node that seems to be doing > > all > > > >> the > > > >> >> > work > > > >> >> > >> is > > > >> >> > >> >>>> the > > > >> >> > >> >>>> > >> Master > > > >> >> > >> >>>> > >> > > > node. > > > >> >> > >> >>>> > >> > > > > > > > > > > >> >> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase > > the > > > >> >> network > > > >> >> > >> cost? > > > >> >> > >> >>>> What > > > >> >> > >> >>>> > >> can > > > >> >> > >> >>>> > >> > I > > > >> >> > >> >>>> > >> > > do > > > >> >> > >> >>>> > >> > > > > to > > > >> >> > >> >>>> > >> > > > > > > > setup > > > >> >> > >> >>>> > >> > > > > > > > the cluster to function more > > > efficiently? > > > >> >> > >> >>>> > >> > > > > > > > > > > >> >> > >> >>>> > >> > > > > > > > Thanks! > > > >> >> > >> >>>> > >> > > > > > > > Mithila Nagendra > > > >> >> > >> >>>> > >> > > > > > > > Arizona State University > > > >> >> > >> >>>> > >> > > > > > > > > > > >> >> > >> >>>> > >> > > > > > > > > > >> >> > >> >>>> > >> > > > > > > > > >> >> > >> >>>> > >> > > > > > > > >> >> > >> >>>> > >> > > > > > > >> >> > >> >>>> > >> > > > > > >> >> > >> >>>> > >> > > > > >> >> > >> >>>> > >> > > > >> >> > >> >>>> > > > > > >> >> > >> >>>> > > > > > >> >> > >> >>>> > > > > >> >> > >> >>>> > > > >> >> > >> >>> > > > >> >> > >> >>> > > > >> >> > >> >> > > > >> >> > >> > > > > >> >> > >> > > > >> >> > >> > > > >> >> > >> Ravi > > > >> >> > >> -- > > > >> >> > >> > > > >> >> > >> > > > >> >> > > > > > >> >> > > > > >> >> > > > >> >> > > > >> >> > > > >> >> -- > > > >> >> Alpha Chapters of my book on Hadoop are available > > > >> >> http://www.apress.com/book/view/9781430219422 > > > >> >> > > > >> > > > > >> > > > > >> > > > > > > > > > > > > > > > > -- > > > > Alpha Chapters of my book on Hadoop are available > > > > http://www.apress.com/book/view/9781430219422 > > > > > > > > > > > > > > > > -- > > > Alpha Chapters of my book on Hadoop are available > > > http://www.apress.com/book/view/9781430219422 > > > > > > > > > -- > Alpha Chapters of my book on Hadoop are available > http://www.apress.com/book/view/9781430219422 >
