Re: Map-Reduce Slow Down

Mithila Nagendra Fri, 17 Apr 2009 11:28:03 -0700

Hey Jason
The problem s fixed! :) My network admin had messed something up! Now it
works! Thanks for your help!


Mithila

On Thu, Apr 16, 2009 at 11:58 PM, Mithila Nagendra <[email protected]> wrote:

> Thanks Jason! This helps a lot. I m planning to talk to my network admin
> tomorrow. I hoping he ll be able to fix this problem.
> Mithila
>
>
> On Fri, Apr 17, 2009 at 9:00 AM, jason hadoop <[email protected]>wrote:
>
>> Assuming you are on a linux box, on both machines
>> verify that the servers are listening on the ports you expect via
>> netstat -a -n -t -p
>> -a show sockets accepting connections
>> -n do not translate ip addresses to host names
>> -t only list tcp sockets
>> -p list the pid/process name
>>
>> on the machine 192.168.0.18
>> you should have sockets bound to 0.0.0.0:54310 with a process of java,
>> and
>> the pid should be the pid of your namenode process.
>>
>> On the remote machine you should be able to *telnet 192.168.0.18 54310*
>> and
>> have it connect
>> *Connected to 192.168.0.18.
>> Escape character is '^]'.
>> *
>>
>> If the netstat shows the socket accepting and the telnet does not connect,
>> then something is blocking the TCP packets between the machines. one or
>> both
>> machines has a firewall, an intervening router has a firewall, or there is
>> some routing problem
>> the command /sbin/iptables -L will normally list the firewall rules, if
>> any
>> for a linux machine.
>>
>>
>> You should be able to use telnet to verify that you can connect from the
>> remote machine.
>>
>> On Thu, Apr 16, 2009 at 9:18 PM, Mithila Nagendra <[email protected]>
>> wrote:
>>
>> > Thanks! I ll see what I can find out.
>> >
>> > On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop <[email protected]
>> > >wrote:
>> >
>> > > The firewall was run at system startup, I think there was a
>> > > /etc/sysconfig/iptables file present which triggered the firewall.
>> > > I don't currently have access to any centos 5 machines so I can't
>> easily
>> > > check.
>> > >
>> > >
>> > >
>> > > On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop <[email protected]
>> > > >wrote:
>> > >
>> > > > The kickstart script was something that the operations staff was
>> using
>> > to
>> > > > initialize new machines, I never actually saw the script, just
>> figured
>> > > out
>> > > > that there was a firewall in place.
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra <[email protected]
>> > > >wrote:
>> > > >
>> > > >> Jason: the kickstart script - was it something you wrote or is it
>> run
>> > > when
>> > > >> the system turns on?
>> > > >> Mithila
>> > > >>
>> > > >> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra <
>> [email protected]>
>> > > >> wrote:
>> > > >>
>> > > >> > Thanks Jason! Will check that out.
>> > > >> > Mithila
>> > > >> >
>> > > >> >
>> > > >> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <
>> > [email protected]
>> > > >> >wrote:
>> > > >> >
>> > > >> >> Double check that there is no firewall in place.
>> > > >> >> At one point a bunch of new machines were kickstarted and placed
>> in
>> > a
>> > > >> >> cluster and they all failed with something similar.
>> > > >> >> It turned out the kickstart script turned enabled the firewall
>> with
>> > a
>> > > >> rule
>> > > >> >> that blocked ports in the 50k range.
>> > > >> >> It took us a while to even think to check that was not a part of
>> > our
>> > > >> >> normal
>> > > >> >> machine configuration
>> > > >> >>
>> > > >> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <
>> > [email protected]
>> > > >
>> > > >> >> wrote:
>> > > >> >>
>> > > >> >> > Hi Aaron
>> > > >> >> > I will look into that thanks!
>> > > >> >> >
>> > > >> >> > I spoke to the admin who overlooks the cluster. He said that
>> the
>> > > >> gateway
>> > > >> >> > comes in to the picture only when one of the nodes
>> communicates
>> > > with
>> > > >> a
>> > > >> >> node
>> > > >> >> > outside of the cluster. But in my case the communication is
>> > carried
>> > > >> out
>> > > >> >> > between the nodes which all belong to the same cluster.
>> > > >> >> >
>> > > >> >> > Mithila
>> > > >> >> >
>> > > >> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <
>> > [email protected]
>> > > >
>> > > >> >> wrote:
>> > > >> >> >
>> > > >> >> > > Hi,
>> > > >> >> > >
>> > > >> >> > > I wrote a blog post a while back about connecting nodes via
>> a
>> > > >> gateway.
>> > > >> >> > See
>> > > >> >> > >
>> > > >> >> >
>> > > >> >>
>> > > >>
>> > >
>> >
>> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
>> > > >> >> > >
>> > > >> >> > > This assumes that the client is outside the gateway and all
>> > > >> >> > > datanodes/namenode are inside, but the same principles
>> apply.
>> > > >> You'll
>> > > >> >> just
>> > > >> >> > > need to set up ssh tunnels from every datanode to the
>> namenode.
>> > > >> >> > >
>> > > >> >> > > - Aaron
>> > > >> >> > >
>> > > >> >> > >
>> > > >> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
>> > > >> >> [email protected]
>> > > >> >> > >wrote:
>> > > >> >> > >
>> > > >> >> > >> Looks like your NameNode is down .
>> > > >> >> > >> Verify if hadoop process are running (   jps should show
>> you
>> > all
>> > > >> java
>> > > >> >> > >> running process).
>> > > >> >> > >> If your hadoop process are running try restarting your
>> hadoop
>> > > >> process
>> > > >> >> .
>> > > >> >> > >> I guess this problem is due to your fsimage not being
>> correct
>> > .
>> > > >> >> > >> You might have to format your namenode.
>> > > >> >> > >> Hope this helps.
>> > > >> >> > >>
>> > > >> >> > >> Thanks,
>> > > >> >> > >> --
>> > > >> >> > >> Ravi
>> > > >> >> > >>
>> > > >> >> > >>
>> > > >> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <[email protected]>
>> > > wrote:
>> > > >> >> > >>
>> > > >> >> > >> The log file runs into thousands of line with the same
>> message
>> > > >> being
>> > > >> >> > >> displayed every time.
>> > > >> >> > >>
>> > > >> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
>> > > >> [email protected]>
>> > > >> >> > >> wrote:
>> > > >> >> > >>
>> > > >> >> > >> > The log file :
>> hadoop-mithila-datanode-node19.log.2009-04-14
>> > > has
>> > > >> >> the
>> > > >> >> > >> > following in it:
>> > > >> >> > >> >
>> > > >> >> > >> > 2009-04-14 10:08:11,499 INFO
>> org.apache.hadoop.dfs.DataNode:
>> > > >> >> > >> STARTUP_MSG:
>> > > >> >> > >> >
>> > /************************************************************
>> > > >> >> > >> > STARTUP_MSG: Starting DataNode
>> > > >> >> > >> > STARTUP_MSG:   host = node19/127.0.0.1
>> > > >> >> > >> > STARTUP_MSG:   args = []
>> > > >> >> > >> > STARTUP_MSG:   version = 0.18.3
>> > > >> >> > >> > STARTUP_MSG:   build =
>> > > >> >> > >> >
>> > > >>
>> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
>> > > >> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC
>> 2009
>> > > >> >> > >> >
>> > ************************************************************/
>> > > >> >> > >> > 2009-04-14 10:08:12,915 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:13,925 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:14,935 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:15,945 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:16,955 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:17,965 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:18,975 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:19,985 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:20,995 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:22,005 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC:
>> > Server
>> > > >> at
>> > > >> >> > >> node18/
>> > > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
>> > > >> >> > >> > 2009-04-14 10:08:24,025 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:25,035 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:26,045 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:27,055 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:28,065 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:29,075 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:30,085 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:31,095 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:32,105 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:33,115 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC:
>> > Server
>> > > >> at
>> > > >> >> > >> node18/
>> > > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
>> > > >> >> > >> > 2009-04-14 10:08:35,135 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:36,145 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
>> > > time(s).
>> > > >> >> > >> > 2009-04-14 10:08:37,155 INFO
>> org.apache.hadoop.ipc.Client:
>> > > >> Retrying
>> > > >> >> > >> connect
>> > > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
>> > > time(s).
>> > > >> >> > >> >
>> > > >> >> > >> >
>> > > >> >> > >> > Hmmm I still cant figure it out..
>> > > >> >> > >> >
>> > > >> >> > >> > Mithila
>> > > >> >> > >> >
>> > > >> >> > >> >
>> > > >> >> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <
>> > > >> >> [email protected]
>> > > >> >> > >> >wrote:
>> > > >> >> > >> >
>> > > >> >> > >> >> Also, Would the way the port is accessed change if all
>> > these
>> > > >> node
>> > > >> >> are
>> > > >> >> > >> >> connected through a gateway? I mean in the
>> hadoop-site.xml
>> > > >> file?
>> > > >> >> The
>> > > >> >> > >> Ubuntu
>> > > >> >> > >> >> systems we worked with earlier didnt have a gateway.
>> > > >> >> > >> >> Mithila
>> > > >> >> > >> >>
>> > > >> >> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <
>> > > >> >> [email protected]
>> > > >> >> > >> >wrote:
>> > > >> >> > >> >>
>> > > >> >> > >> >>> Aaron: Which log file do I look into - there are alot
>> of
>> > > them.
>> > > >> >> Here
>> > > >> >> > s
>> > > >> >> > >> >>> what the error looks like:
>> > > >> >> > >> >>> [mith...@node19:~]$ cd hadoop
>> > > >> >> > >> >>> [mith...@node19:~/hadoop]$ bin/hadoop dfs -ls
>> > > >> >> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 0 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 1 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 2 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 3 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 4 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 5 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 6 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 7 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 8 time(s).
>> > > >> >> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to
>> > > server:
>> > > >> >> > node18/
>> > > >> >> > >> >>> 192.168.0.18:54310. Already tried 9 time(s).
>> > > >> >> > >> >>> Bad connection to FS. command aborted.
>> > > >> >> > >> >>>
>> > > >> >> > >> >>> Node19 is a slave and Node18 is the master.
>> > > >> >> > >> >>>
>> > > >> >> > >> >>> Mithila
>> > > >> >> > >> >>>
>> > > >> >> > >> >>>
>> > > >> >> > >> >>>
>> > > >> >> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <
>> > > >> >> [email protected]
>> > > >> >> > >> >wrote:
>> > > >> >> > >> >>>
>> > > >> >> > >> >>>> Are there any error messages in the log files on those
>> > > nodes?
>> > > >> >> > >> >>>> - Aaron
>> > > >> >> > >> >>>>
>> > > >> >> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <
>> > > >> >> > [email protected]>
>> > > >> >> > >> >>>> wrote:
>> > > >> >> > >> >>>>
>> > > >> >> > >> >>>> > I ve drawn a blank here! Can't figure out what s
>> wrong
>> > > with
>> > > >> >> the
>> > > >> >> > >> ports.
>> > > >> >> > >> >>>> I
>> > > >> >> > >> >>>> > can
>> > > >> >> > >> >>>> > ssh between the nodes but cant access the DFS from
>> the
>> > > >> slaves
>> > > >> >> -
>> > > >> >> > >> says
>> > > >> >> > >> >>>> "Bad
>> > > >> >> > >> >>>> > connection to DFS". Master seems to be fine.
>> > > >> >> > >> >>>> > Mithila
>> > > >> >> > >> >>>> >
>> > > >> >> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
>> > > >> >> > >> [email protected]>
>> > > >> >> > >> >>>> > wrote:
>> > > >> >> > >> >>>> >
>> > > >> >> > >> >>>> > > Yes I can..
>> > > >> >> > >> >>>> > >
>> > > >> >> > >> >>>> > >
>> > > >> >> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
>> > > >> >> > >> [email protected]
>> > > >> >> > >> >>>> > >wrote:
>> > > >> >> > >> >>>> > >
>> > > >> >> > >> >>>> > >> Can you ssh between the nodes?
>> > > >> >> > >> >>>> > >>
>> > > >> >> > >> >>>> > >> -jim
>> > > >> >> > >> >>>> > >>
>> > > >> >> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra
>> <
>> > > >> >> > >> >>>> [email protected]>
>> > > >> >> > >> >>>> > >> wrote:
>> > > >> >> > >> >>>> > >>
>> > > >> >> > >> >>>> > >> > Thanks Aaron.
>> > > >> >> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu
>> running
>> > > on
>> > > >> >> them
>> > > >> >> > and
>> > > >> >> > >> >>>> the dfs
>> > > >> >> > >> >>>> > >> was
>> > > >> >> > >> >>>> > >> > accessed at port 54310. The new cluster which I
>> ve
>> > > >> setup
>> > > >> >> has
>> > > >> >> > >> Red
>> > > >> >> > >> >>>> Hat
>> > > >> >> > >> >>>> > >> Linux
>> > > >> >> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I
>> try
>> > to
>> > > >> >> access
>> > > >> >> > >> the
>> > > >> >> > >> >>>> dfs
>> > > >> >> > >> >>>> > from
>> > > >> >> > >> >>>> > >> > one
>> > > >> >> > >> >>>> > >> > of the slaves i get the following response: dfs
>> > > cannot
>> > > >> be
>> > > >> >> > >> >>>> accessed.
>> > > >> >> > >> >>>> > When
>> > > >> >> > >> >>>> > >> I
>> > > >> >> > >> >>>> > >> > access the DFS throught the master there s no
>> > > problem.
>> > > >> So
>> > > >> >> I
>> > > >> >> > >> feel
>> > > >> >> > >> >>>> there
>> > > >> >> > >> >>>> > a
>> > > >> >> > >> >>>> > >> > problem with the port. Any ideas? I did check
>> the
>> > > list
>> > > >> of
>> > > >> >> > >> slaves,
>> > > >> >> > >> >>>> it
>> > > >> >> > >> >>>> > >> looks
>> > > >> >> > >> >>>> > >> > fine to me.
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >> > Mithila
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
>> > > >> >> > >> >>>> [email protected]>
>> > > >> >> > >> >>>> > >> > wrote:
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >> > > Mithila,
>> > > >> >> > >> >>>> > >> > >
>> > > >> >> > >> >>>> > >> > > You said all the slaves were being utilized
>> in
>> > the
>> > > 3
>> > > >> >> node
>> > > >> >> > >> >>>> cluster.
>> > > >> >> > >> >>>> > >> Which
>> > > >> >> > >> >>>> > >> > > application did you run to test that and what
>> > was
>> > > >> your
>> > > >> >> > input
>> > > >> >> > >> >>>> size?
>> > > >> >> > >> >>>> > If
>> > > >> >> > >> >>>> > >> you
>> > > >> >> > >> >>>> > >> > > tried the word count application on a 516 MB
>> > input
>> > > >> file
>> > > >> >> on
>> > > >> >> > >> both
>> > > >> >> > >> >>>> > >> cluster
>> > > >> >> > >> >>>> > >> > > setups, than some of your nodes in the 15
>> node
>> > > >> cluster
>> > > >> >> may
>> > > >> >> > >> not
>> > > >> >> > >> >>>> be
>> > > >> >> > >> >>>> > >> running
>> > > >> >> > >> >>>> > >> > > at
>> > > >> >> > >> >>>> > >> > > all. Generally, one map job is assigned to
>> each
>> > > >> input
>> > > >> >> > split
>> > > >> >> > >> and
>> > > >> >> > >> >>>> if
>> > > >> >> > >> >>>> > you
>> > > >> >> > >> >>>> > >> > are
>> > > >> >> > >> >>>> > >> > > running your cluster with the defaults, the
>> > splits
>> > > >> are
>> > > >> >> 64
>> > > >> >> > MB
>> > > >> >> > >> >>>> each. I
>> > > >> >> > >> >>>> > >> got
>> > > >> >> > >> >>>> > >> > > confused when you said the Namenode seemed to
>> do
>> > > all
>> > > >> >> the
>> > > >> >> > >> work.
>> > > >> >> > >> >>>> Can
>> > > >> >> > >> >>>> > you
>> > > >> >> > >> >>>> > >> > > check
>> > > >> >> > >> >>>> > >> > > conf/slaves and make sure you put the names
>> of
>> > all
>> > > >> task
>> > > >> >> > >> >>>> trackers
>> > > >> >> > >> >>>> > >> there? I
>> > > >> >> > >> >>>> > >> > > also suggest comparing both clusters with a
>> > larger
>> > > >> >> input
>> > > >> >> > >> size,
>> > > >> >> > >> >>>> say
>> > > >> >> > >> >>>> > at
>> > > >> >> > >> >>>> > >> > least
>> > > >> >> > >> >>>> > >> > > 5 GB, to really see a difference.
>> > > >> >> > >> >>>> > >> > >
>> > > >> >> > >> >>>> > >> > > Jim
>> > > >> >> > >> >>>> > >> > >
>> > > >> >> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron
>> Kimball <
>> > > >> >> > >> >>>> [email protected]>
>> > > >> >> > >> >>>> > >> > wrote:
>> > > >> >> > >> >>>> > >> > >
>> > > >> >> > >> >>>> > >> > > > in hadoop-*-examples.jar, use
>> "randomwriter"
>> > to
>> > > >> >> generate
>> > > >> >> > >> the
>> > > >> >> > >> >>>> data
>> > > >> >> > >> >>>> > >> and
>> > > >> >> > >> >>>> > >> > > > "sort"
>> > > >> >> > >> >>>> > >> > > > to sort it.
>> > > >> >> > >> >>>> > >> > > > - Aaron
>> > > >> >> > >> >>>> > >> > > >
>> > > >> >> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil
>> Doshi
>> > <
>> > > >> >> > >> >>>> > [email protected]>
>> > > >> >> > >> >>>> > >> > > wrote:
>> > > >> >> > >> >>>> > >> > > >
>> > > >> >> > >> >>>> > >> > > > > Your data is too small I guess for 15
>> > clusters
>> > > >> ..So
>> > > >> >> it
>> > > >> >> > >> >>>> might be
>> > > >> >> > >> >>>> > >> > > overhead
>> > > >> >> > >> >>>> > >> > > > > time of these clusters making your total
>> MR
>> > > jobs
>> > > >> >> more
>> > > >> >> > >> time
>> > > >> >> > >> >>>> > >> consuming.
>> > > >> >> > >> >>>> > >> > > > > I guess you will have to try with larger
>> set
>> > > of
>> > > >> >> data..
>> > > >> >> > >> >>>> > >> > > > >
>> > > >> >> > >> >>>> > >> > > > > Pankil
>> > > >> >> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila
>> > > >> Nagendra <
>> > > >> >> > >> >>>> > >> [email protected]>
>> > > >> >> > >> >>>> > >> > > > > wrote:
>> > > >> >> > >> >>>> > >> > > > >
>> > > >> >> > >> >>>> > >> > > > > > Aaron
>> > > >> >> > >> >>>> > >> > > > > >
>> > > >> >> > >> >>>> > >> > > > > > That could be the issue, my data is
>> just
>> > > 516MB
>> > > >> -
>> > > >> >> > >> wouldn't
>> > > >> >> > >> >>>> this
>> > > >> >> > >> >>>> > >> see
>> > > >> >> > >> >>>> > >> > a
>> > > >> >> > >> >>>> > >> > > > bit
>> > > >> >> > >> >>>> > >> > > > > of
>> > > >> >> > >> >>>> > >> > > > > > speed up?
>> > > >> >> > >> >>>> > >> > > > > > Could you guide me to the example? I ll
>> > run
>> > > my
>> > > >> >> > cluster
>> > > >> >> > >> on
>> > > >> >> > >> >>>> it
>> > > >> >> > >> >>>> > and
>> > > >> >> > >> >>>> > >> > see
>> > > >> >> > >> >>>> > >> > > > what
>> > > >> >> > >> >>>> > >> > > > > I
>> > > >> >> > >> >>>> > >> > > > > > get. Also for my program I had a java
>> > timer
>> > > >> >> running
>> > > >> >> > to
>> > > >> >> > >> >>>> record
>> > > >> >> > >> >>>> > >> the
>> > > >> >> > >> >>>> > >> > > time
>> > > >> >> > >> >>>> > >> > > > > > taken
>> > > >> >> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have
>> an
>> > > >> >> inbuilt
>> > > >> >> > >> timer?
>> > > >> >> > >> >>>> > >> > > > > >
>> > > >> >> > >> >>>> > >> > > > > > Mithila
>> > > >> >> > >> >>>> > >> > > > > >
>> > > >> >> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron
>> > > Kimball
>> > > >> <
>> > > >> >> > >> >>>> > >> [email protected]
>> > > >> >> > >> >>>> > >> > >
>> > > >> >> > >> >>>> > >> > > > > wrote:
>> > > >> >> > >> >>>> > >> > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > Virtually none of the examples that
>> ship
>> > > >> with
>> > > >> >> > Hadoop
>> > > >> >> > >> >>>> are
>> > > >> >> > >> >>>> > >> designed
>> > > >> >> > >> >>>> > >> > > to
>> > > >> >> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup
>> > comes
>> > > >> from
>> > > >> >> > its
>> > > >> >> > >> >>>> ability
>> > > >> >> > >> >>>> > to
>> > > >> >> > >> >>>> > >> > > > process
>> > > >> >> > >> >>>> > >> > > > > > very
>> > > >> >> > >> >>>> > >> > > > > > > large volumes of data (starting
>> around,
>> > > say,
>> > > >> >> tens
>> > > >> >> > of
>> > > >> >> > >> GB
>> > > >> >> > >> >>>> per
>> > > >> >> > >> >>>> > >> job,
>> > > >> >> > >> >>>> > >> > > and
>> > > >> >> > >> >>>> > >> > > > > > going
>> > > >> >> > >> >>>> > >> > > > > > > up in orders of magnitude from
>> there).
>> > So
>> > > if
>> > > >> >> you
>> > > >> >> > are
>> > > >> >> > >> >>>> timing
>> > > >> >> > >> >>>> > >> the
>> > > >> >> > >> >>>> > >> > pi
>> > > >> >> > >> >>>> > >> > > > > > > calculator (or something like that),
>> its
>> > > >> >> results
>> > > >> >> > >> won't
>> > > >> >> > >> >>>> > >> > necessarily
>> > > >> >> > >> >>>> > >> > > be
>> > > >> >> > >> >>>> > >> > > > > > very
>> > > >> >> > >> >>>> > >> > > > > > > consistent. If a job doesn't have
>> enough
>> > > >> >> fragments
>> > > >> >> > >> of
>> > > >> >> > >> >>>> data
>> > > >> >> > >> >>>> > to
>> > > >> >> > >> >>>> > >> > > > allocate
>> > > >> >> > >> >>>> > >> > > > > > one
>> > > >> >> > >> >>>> > >> > > > > > > per each node, some of the nodes will
>> > also
>> > > >> just
>> > > >> >> go
>> > > >> >> > >> >>>> unused.
>> > > >> >> > >> >>>> > >> > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > The best example for you to run is to
>> > use
>> > > >> >> > >> randomwriter
>> > > >> >> > >> >>>> to
>> > > >> >> > >> >>>> > fill
>> > > >> >> > >> >>>> > >> up
>> > > >> >> > >> >>>> > >> > > > your
>> > > >> >> > >> >>>> > >> > > > > > > cluster with several GB of random
>> data
>> > and
>> > > >> then
>> > > >> >> > run
>> > > >> >> > >> the
>> > > >> >> > >> >>>> sort
>> > > >> >> > >> >>>> > >> > > program.
>> > > >> >> > >> >>>> > >> > > > > If
>> > > >> >> > >> >>>> > >> > > > > > > that doesn't scale up performance
>> from 3
>> > > >> nodes
>> > > >> >> to
>> > > >> >> > >> 15,
>> > > >> >> > >> >>>> then
>> > > >> >> > >> >>>> > >> you've
>> > > >> >> > >> >>>> > >> > > > > > > definitely
>> > > >> >> > >> >>>> > >> > > > > > > got something strange going on.
>> > > >> >> > >> >>>> > >> > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > - Aaron
>> > > >> >> > >> >>>> > >> > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM,
>> Mithila
>> > > >> >> Nagendra
>> > > >> >> > <
>> > > >> >> > >> >>>> > >> > > [email protected]>
>> > > >> >> > >> >>>> > >> > > > > > > wrote:
>> > > >> >> > >> >>>> > >> > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > > Hey all
>> > > >> >> > >> >>>> > >> > > > > > > > I recently setup a three node
>> hadoop
>> > > >> cluster
>> > > >> >> and
>> > > >> >> > >> ran
>> > > >> >> > >> >>>> an
>> > > >> >> > >> >>>> > >> > examples
>> > > >> >> > >> >>>> > >> > > on
>> > > >> >> > >> >>>> > >> > > > > it.
>> > > >> >> > >> >>>> > >> > > > > > > It
>> > > >> >> > >> >>>> > >> > > > > > > > was pretty fast, and all the three
>> > nodes
>> > > >> were
>> > > >> >> > >> being
>> > > >> >> > >> >>>> used
>> > > >> >> > >> >>>> > (I
>> > > >> >> > >> >>>> > >> > > checked
>> > > >> >> > >> >>>> > >> > > > > the
>> > > >> >> > >> >>>> > >> > > > > > > log
>> > > >> >> > >> >>>> > >> > > > > > > > files to make sure that the slaves
>> are
>> > > >> >> > utilized).
>> > > >> >> > >> >>>> > >> > > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > > Now I ve setup another cluster
>> > > consisting
>> > > >> of
>> > > >> >> 15
>> > > >> >> > >> >>>> nodes. I
>> > > >> >> > >> >>>> > ran
>> > > >> >> > >> >>>> > >> > the
>> > > >> >> > >> >>>> > >> > > > same
>> > > >> >> > >> >>>> > >> > > > > > > > example, but instead of speeding
>> up,
>> > the
>> > > >> >> > >> map-reduce
>> > > >> >> > >> >>>> task
>> > > >> >> > >> >>>> > >> seems
>> > > >> >> > >> >>>> > >> > to
>> > > >> >> > >> >>>> > >> > > > > take
>> > > >> >> > >> >>>> > >> > > > > > > > forever! The slaves are not being
>> used
>> > > for
>> > > >> >> some
>> > > >> >> > >> >>>> reason.
>> > > >> >> > >> >>>> > This
>> > > >> >> > >> >>>> > >> > > second
>> > > >> >> > >> >>>> > >> > > > > > > cluster
>> > > >> >> > >> >>>> > >> > > > > > > > has a lower, per node processing
>> > power,
>> > > >> but
>> > > >> >> > should
>> > > >> >> > >> >>>> that
>> > > >> >> > >> >>>> > make
>> > > >> >> > >> >>>> > >> > any
>> > > >> >> > >> >>>> > >> > > > > > > > difference?
>> > > >> >> > >> >>>> > >> > > > > > > > How can I ensure that the data is
>> > being
>> > > >> >> mapped
>> > > >> >> > to
>> > > >> >> > >> all
>> > > >> >> > >> >>>> the
>> > > >> >> > >> >>>> > >> > nodes?
>> > > >> >> > >> >>>> > >> > > > > > > Presently,
>> > > >> >> > >> >>>> > >> > > > > > > > the only node that seems to be
>> doing
>> > all
>> > > >> the
>> > > >> >> > work
>> > > >> >> > >> is
>> > > >> >> > >> >>>> the
>> > > >> >> > >> >>>> > >> Master
>> > > >> >> > >> >>>> > >> > > > node.
>> > > >> >> > >> >>>> > >> > > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase
>> > the
>> > > >> >> network
>> > > >> >> > >> cost?
>> > > >> >> > >> >>>> What
>> > > >> >> > >> >>>> > >> can
>> > > >> >> > >> >>>> > >> > I
>> > > >> >> > >> >>>> > >> > > do
>> > > >> >> > >> >>>> > >> > > > > to
>> > > >> >> > >> >>>> > >> > > > > > > > setup
>> > > >> >> > >> >>>> > >> > > > > > > > the cluster to function more
>> > > efficiently?
>> > > >> >> > >> >>>> > >> > > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > > > Thanks!
>> > > >> >> > >> >>>> > >> > > > > > > > Mithila Nagendra
>> > > >> >> > >> >>>> > >> > > > > > > > Arizona State University
>> > > >> >> > >> >>>> > >> > > > > > > >
>> > > >> >> > >> >>>> > >> > > > > > >
>> > > >> >> > >> >>>> > >> > > > > >
>> > > >> >> > >> >>>> > >> > > > >
>> > > >> >> > >> >>>> > >> > > >
>> > > >> >> > >> >>>> > >> > >
>> > > >> >> > >> >>>> > >> >
>> > > >> >> > >> >>>> > >>
>> > > >> >> > >> >>>> > >
>> > > >> >> > >> >>>> > >
>> > > >> >> > >> >>>> >
>> > > >> >> > >> >>>>
>> > > >> >> > >> >>>
>> > > >> >> > >> >>>
>> > > >> >> > >> >>
>> > > >> >> > >> >
>> > > >> >> > >>
>> > > >> >> > >>
>> > > >> >> > >> Ravi
>> > > >> >> > >> --
>> > > >> >> > >>
>> > > >> >> > >>
>> > > >> >> > >
>> > > >> >> >
>> > > >> >>
>> > > >> >>
>> > > >> >>
>> > > >> >> --
>> > > >> >> Alpha Chapters of my book on Hadoop are available
>> > > >> >> http://www.apress.com/book/view/9781430219422
>> > > >> >>
>> > > >> >
>> > > >> >
>> > > >>
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > Alpha Chapters of my book on Hadoop are available
>> > > > http://www.apress.com/book/view/9781430219422
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Alpha Chapters of my book on Hadoop are available
>> > > http://www.apress.com/book/view/9781430219422
>> > >
>> >
>>
>>
>>
>> --
>> Alpha Chapters of my book on Hadoop are available
>> http://www.apress.com/book/view/9781430219422
>>
>
>

Re: Map-Reduce Slow Down

Reply via email to