Re: Map-Reduce Slow Down

jason hadoop Thu, 16 Apr 2009 23:01:07 -0700

Assuming you are on a linux box, on both machines
verify that the servers are listening on the ports you expect via
netstat -a -n -t -p
-a show sockets accepting connections
-n do not translate ip addresses to host names
-t only list tcp sockets
-p list the pid/process name


on the machine 192.168.0.18
you should have sockets bound to 0.0.0.0:54310 with a process of java, and
the pid should be the pid of your namenode process.

On the remote machine you should be able to *telnet 192.168.0.18 54310* and
have it connect
*Connected to 192.168.0.18.
Escape character is '^]'.
*

If the netstat shows the socket accepting and the telnet does not connect,
then something is blocking the TCP packets between the machines. one or both
machines has a firewall, an intervening router has a firewall, or there is
some routing problem
the command /sbin/iptables -L will normally list the firewall rules, if any
for a linux machine.


You should be able to use telnet to verify that you can connect from the
remote machine.

On Thu, Apr 16, 2009 at 9:18 PM, Mithila Nagendra <[email protected]> wrote:

> Thanks! I ll see what I can find out.
>
> On Fri, Apr 17, 2009 at 4:55 AM, jason hadoop <[email protected]
> >wrote:
>
> > The firewall was run at system startup, I think there was a
> > /etc/sysconfig/iptables file present which triggered the firewall.
> > I don't currently have access to any centos 5 machines so I can't easily
> > check.
> >
> >
> >
> > On Thu, Apr 16, 2009 at 6:54 PM, jason hadoop <[email protected]
> > >wrote:
> >
> > > The kickstart script was something that the operations staff was using
> to
> > > initialize new machines, I never actually saw the script, just figured
> > out
> > > that there was a firewall in place.
> > >
> > >
> > >
> > > On Thu, Apr 16, 2009 at 1:28 PM, Mithila Nagendra <[email protected]
> > >wrote:
> > >
> > >> Jason: the kickstart script - was it something you wrote or is it run
> > when
> > >> the system turns on?
> > >> Mithila
> > >>
> > >> On Thu, Apr 16, 2009 at 1:06 AM, Mithila Nagendra <[email protected]>
> > >> wrote:
> > >>
> > >> > Thanks Jason! Will check that out.
> > >> > Mithila
> > >> >
> > >> >
> > >> > On Thu, Apr 16, 2009 at 5:23 AM, jason hadoop <
> [email protected]
> > >> >wrote:
> > >> >
> > >> >> Double check that there is no firewall in place.
> > >> >> At one point a bunch of new machines were kickstarted and placed in
> a
> > >> >> cluster and they all failed with something similar.
> > >> >> It turned out the kickstart script turned enabled the firewall with
> a
> > >> rule
> > >> >> that blocked ports in the 50k range.
> > >> >> It took us a while to even think to check that was not a part of
> our
> > >> >> normal
> > >> >> machine configuration
> > >> >>
> > >> >> On Wed, Apr 15, 2009 at 11:04 AM, Mithila Nagendra <
> [email protected]
> > >
> > >> >> wrote:
> > >> >>
> > >> >> > Hi Aaron
> > >> >> > I will look into that thanks!
> > >> >> >
> > >> >> > I spoke to the admin who overlooks the cluster. He said that the
> > >> gateway
> > >> >> > comes in to the picture only when one of the nodes communicates
> > with
> > >> a
> > >> >> node
> > >> >> > outside of the cluster. But in my case the communication is
> carried
> > >> out
> > >> >> > between the nodes which all belong to the same cluster.
> > >> >> >
> > >> >> > Mithila
> > >> >> >
> > >> >> > On Wed, Apr 15, 2009 at 8:59 PM, Aaron Kimball <
> [email protected]
> > >
> > >> >> wrote:
> > >> >> >
> > >> >> > > Hi,
> > >> >> > >
> > >> >> > > I wrote a blog post a while back about connecting nodes via a
> > >> gateway.
> > >> >> > See
> > >> >> > >
> > >> >> >
> > >> >>
> > >>
> >
> http://www.cloudera.com/blog/2008/12/03/securing-a-hadoop-cluster-through-a-gateway/
> > >> >> > >
> > >> >> > > This assumes that the client is outside the gateway and all
> > >> >> > > datanodes/namenode are inside, but the same principles apply.
> > >> You'll
> > >> >> just
> > >> >> > > need to set up ssh tunnels from every datanode to the namenode.
> > >> >> > >
> > >> >> > > - Aaron
> > >> >> > >
> > >> >> > >
> > >> >> > > On Wed, Apr 15, 2009 at 10:19 AM, Ravi Phulari <
> > >> >> [email protected]
> > >> >> > >wrote:
> > >> >> > >
> > >> >> > >> Looks like your NameNode is down .
> > >> >> > >> Verify if hadoop process are running (   jps should show you
> all
> > >> java
> > >> >> > >> running process).
> > >> >> > >> If your hadoop process are running try restarting your hadoop
> > >> process
> > >> >> .
> > >> >> > >> I guess this problem is due to your fsimage not being correct
> .
> > >> >> > >> You might have to format your namenode.
> > >> >> > >> Hope this helps.
> > >> >> > >>
> > >> >> > >> Thanks,
> > >> >> > >> --
> > >> >> > >> Ravi
> > >> >> > >>
> > >> >> > >>
> > >> >> > >> On 4/15/09 10:15 AM, "Mithila Nagendra" <[email protected]>
> > wrote:
> > >> >> > >>
> > >> >> > >> The log file runs into thousands of line with the same message
> > >> being
> > >> >> > >> displayed every time.
> > >> >> > >>
> > >> >> > >> On Wed, Apr 15, 2009 at 8:10 PM, Mithila Nagendra <
> > >> [email protected]>
> > >> >> > >> wrote:
> > >> >> > >>
> > >> >> > >> > The log file : hadoop-mithila-datanode-node19.log.2009-04-14
> > has
> > >> >> the
> > >> >> > >> > following in it:
> > >> >> > >> >
> > >> >> > >> > 2009-04-14 10:08:11,499 INFO org.apache.hadoop.dfs.DataNode:
> > >> >> > >> STARTUP_MSG:
> > >> >> > >> >
> /************************************************************
> > >> >> > >> > STARTUP_MSG: Starting DataNode
> > >> >> > >> > STARTUP_MSG:   host = node19/127.0.0.1
> > >> >> > >> > STARTUP_MSG:   args = []
> > >> >> > >> > STARTUP_MSG:   version = 0.18.3
> > >> >> > >> > STARTUP_MSG:   build =
> > >> >> > >> >
> > >> https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18-r
> > >> >> > >> > 736250; compiled by 'ndaley' on Thu Jan 22 23:12:08 UTC 2009
> > >> >> > >> >
> ************************************************************/
> > >> >> > >> > 2009-04-14 10:08:12,915 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:13,925 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:14,935 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:15,945 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:16,955 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:17,965 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:18,975 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:19,985 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:20,995 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:22,005 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:22,008 INFO org.apache.hadoop.ipc.RPC:
> Server
> > >> at
> > >> >> > >> node18/
> > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
> > >> >> > >> > 2009-04-14 10:08:24,025 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:25,035 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:26,045 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:27,055 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 3
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:28,065 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 4
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:29,075 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 5
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:30,085 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 6
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:31,095 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 7
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:32,105 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 8
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:33,115 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 9
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:33,116 INFO org.apache.hadoop.ipc.RPC:
> Server
> > >> at
> > >> >> > >> node18/
> > >> >> > >> > 192.168.0.18:54310 not available yet, Zzzzz...
> > >> >> > >> > 2009-04-14 10:08:35,135 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 0
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:36,145 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 1
> > time(s).
> > >> >> > >> > 2009-04-14 10:08:37,155 INFO org.apache.hadoop.ipc.Client:
> > >> Retrying
> > >> >> > >> connect
> > >> >> > >> > to server: node18/192.168.0.18:54310. Already tried 2
> > time(s).
> > >> >> > >> >
> > >> >> > >> >
> > >> >> > >> > Hmmm I still cant figure it out..
> > >> >> > >> >
> > >> >> > >> > Mithila
> > >> >> > >> >
> > >> >> > >> >
> > >> >> > >> > On Tue, Apr 14, 2009 at 10:22 PM, Mithila Nagendra <
> > >> >> [email protected]
> > >> >> > >> >wrote:
> > >> >> > >> >
> > >> >> > >> >> Also, Would the way the port is accessed change if all
> these
> > >> node
> > >> >> are
> > >> >> > >> >> connected through a gateway? I mean in the hadoop-site.xml
> > >> file?
> > >> >> The
> > >> >> > >> Ubuntu
> > >> >> > >> >> systems we worked with earlier didnt have a gateway.
> > >> >> > >> >> Mithila
> > >> >> > >> >>
> > >> >> > >> >> On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <
> > >> >> [email protected]
> > >> >> > >> >wrote:
> > >> >> > >> >>
> > >> >> > >> >>> Aaron: Which log file do I look into - there are alot of
> > them.
> > >> >> Here
> > >> >> > s
> > >> >> > >> >>> what the error looks like:
> > >> >> > >> >>> [mith...@node19:~]$ cd hadoop
> > >> >> > >> >>> [mith...@node19:~/hadoop]$ bin/hadoop dfs -ls
> > >> >> > >> >>> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 0 time(s).
> > >> >> > >> >>> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 1 time(s).
> > >> >> > >> >>> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 2 time(s).
> > >> >> > >> >>> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 3 time(s).
> > >> >> > >> >>> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 4 time(s).
> > >> >> > >> >>> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 5 time(s).
> > >> >> > >> >>> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 6 time(s).
> > >> >> > >> >>> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 7 time(s).
> > >> >> > >> >>> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 8 time(s).
> > >> >> > >> >>> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to
> > server:
> > >> >> > node18/
> > >> >> > >> >>> 192.168.0.18:54310. Already tried 9 time(s).
> > >> >> > >> >>> Bad connection to FS. command aborted.
> > >> >> > >> >>>
> > >> >> > >> >>> Node19 is a slave and Node18 is the master.
> > >> >> > >> >>>
> > >> >> > >> >>> Mithila
> > >> >> > >> >>>
> > >> >> > >> >>>
> > >> >> > >> >>>
> > >> >> > >> >>> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <
> > >> >> [email protected]
> > >> >> > >> >wrote:
> > >> >> > >> >>>
> > >> >> > >> >>>> Are there any error messages in the log files on those
> > nodes?
> > >> >> > >> >>>> - Aaron
> > >> >> > >> >>>>
> > >> >> > >> >>>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <
> > >> >> > [email protected]>
> > >> >> > >> >>>> wrote:
> > >> >> > >> >>>>
> > >> >> > >> >>>> > I ve drawn a blank here! Can't figure out what s wrong
> > with
> > >> >> the
> > >> >> > >> ports.
> > >> >> > >> >>>> I
> > >> >> > >> >>>> > can
> > >> >> > >> >>>> > ssh between the nodes but cant access the DFS from the
> > >> slaves
> > >> >> -
> > >> >> > >> says
> > >> >> > >> >>>> "Bad
> > >> >> > >> >>>> > connection to DFS". Master seems to be fine.
> > >> >> > >> >>>> > Mithila
> > >> >> > >> >>>> >
> > >> >> > >> >>>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <
> > >> >> > >> [email protected]>
> > >> >> > >> >>>> > wrote:
> > >> >> > >> >>>> >
> > >> >> > >> >>>> > > Yes I can..
> > >> >> > >> >>>> > >
> > >> >> > >> >>>> > >
> > >> >> > >> >>>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <
> > >> >> > >> [email protected]
> > >> >> > >> >>>> > >wrote:
> > >> >> > >> >>>> > >
> > >> >> > >> >>>> > >> Can you ssh between the nodes?
> > >> >> > >> >>>> > >>
> > >> >> > >> >>>> > >> -jim
> > >> >> > >> >>>> > >>
> > >> >> > >> >>>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <
> > >> >> > >> >>>> [email protected]>
> > >> >> > >> >>>> > >> wrote:
> > >> >> > >> >>>> > >>
> > >> >> > >> >>>> > >> > Thanks Aaron.
> > >> >> > >> >>>> > >> > Jim: The three clusters I setup had ubuntu running
> > on
> > >> >> them
> > >> >> > and
> > >> >> > >> >>>> the dfs
> > >> >> > >> >>>> > >> was
> > >> >> > >> >>>> > >> > accessed at port 54310. The new cluster which I ve
> > >> setup
> > >> >> has
> > >> >> > >> Red
> > >> >> > >> >>>> Hat
> > >> >> > >> >>>> > >> Linux
> > >> >> > >> >>>> > >> > release 7.2 (Enigma)running on it. Now when I try
> to
> > >> >> access
> > >> >> > >> the
> > >> >> > >> >>>> dfs
> > >> >> > >> >>>> > from
> > >> >> > >> >>>> > >> > one
> > >> >> > >> >>>> > >> > of the slaves i get the following response: dfs
> > cannot
> > >> be
> > >> >> > >> >>>> accessed.
> > >> >> > >> >>>> > When
> > >> >> > >> >>>> > >> I
> > >> >> > >> >>>> > >> > access the DFS throught the master there s no
> > problem.
> > >> So
> > >> >> I
> > >> >> > >> feel
> > >> >> > >> >>>> there
> > >> >> > >> >>>> > a
> > >> >> > >> >>>> > >> > problem with the port. Any ideas? I did check the
> > list
> > >> of
> > >> >> > >> slaves,
> > >> >> > >> >>>> it
> > >> >> > >> >>>> > >> looks
> > >> >> > >> >>>> > >> > fine to me.
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >> > Mithila
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
> > >> >> > >> >>>> [email protected]>
> > >> >> > >> >>>> > >> > wrote:
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >> > > Mithila,
> > >> >> > >> >>>> > >> > >
> > >> >> > >> >>>> > >> > > You said all the slaves were being utilized in
> the
> > 3
> > >> >> node
> > >> >> > >> >>>> cluster.
> > >> >> > >> >>>> > >> Which
> > >> >> > >> >>>> > >> > > application did you run to test that and what
> was
> > >> your
> > >> >> > input
> > >> >> > >> >>>> size?
> > >> >> > >> >>>> > If
> > >> >> > >> >>>> > >> you
> > >> >> > >> >>>> > >> > > tried the word count application on a 516 MB
> input
> > >> file
> > >> >> on
> > >> >> > >> both
> > >> >> > >> >>>> > >> cluster
> > >> >> > >> >>>> > >> > > setups, than some of your nodes in the 15 node
> > >> cluster
> > >> >> may
> > >> >> > >> not
> > >> >> > >> >>>> be
> > >> >> > >> >>>> > >> running
> > >> >> > >> >>>> > >> > > at
> > >> >> > >> >>>> > >> > > all. Generally, one map job is assigned to each
> > >> input
> > >> >> > split
> > >> >> > >> and
> > >> >> > >> >>>> if
> > >> >> > >> >>>> > you
> > >> >> > >> >>>> > >> > are
> > >> >> > >> >>>> > >> > > running your cluster with the defaults, the
> splits
> > >> are
> > >> >> 64
> > >> >> > MB
> > >> >> > >> >>>> each. I
> > >> >> > >> >>>> > >> got
> > >> >> > >> >>>> > >> > > confused when you said the Namenode seemed to do
> > all
> > >> >> the
> > >> >> > >> work.
> > >> >> > >> >>>> Can
> > >> >> > >> >>>> > you
> > >> >> > >> >>>> > >> > > check
> > >> >> > >> >>>> > >> > > conf/slaves and make sure you put the names of
> all
> > >> task
> > >> >> > >> >>>> trackers
> > >> >> > >> >>>> > >> there? I
> > >> >> > >> >>>> > >> > > also suggest comparing both clusters with a
> larger
> > >> >> input
> > >> >> > >> size,
> > >> >> > >> >>>> say
> > >> >> > >> >>>> > at
> > >> >> > >> >>>> > >> > least
> > >> >> > >> >>>> > >> > > 5 GB, to really see a difference.
> > >> >> > >> >>>> > >> > >
> > >> >> > >> >>>> > >> > > Jim
> > >> >> > >> >>>> > >> > >
> > >> >> > >> >>>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
> > >> >> > >> >>>> [email protected]>
> > >> >> > >> >>>> > >> > wrote:
> > >> >> > >> >>>> > >> > >
> > >> >> > >> >>>> > >> > > > in hadoop-*-examples.jar, use "randomwriter"
> to
> > >> >> generate
> > >> >> > >> the
> > >> >> > >> >>>> data
> > >> >> > >> >>>> > >> and
> > >> >> > >> >>>> > >> > > > "sort"
> > >> >> > >> >>>> > >> > > > to sort it.
> > >> >> > >> >>>> > >> > > > - Aaron
> > >> >> > >> >>>> > >> > > >
> > >> >> > >> >>>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi
> <
> > >> >> > >> >>>> > [email protected]>
> > >> >> > >> >>>> > >> > > wrote:
> > >> >> > >> >>>> > >> > > >
> > >> >> > >> >>>> > >> > > > > Your data is too small I guess for 15
> clusters
> > >> ..So
> > >> >> it
> > >> >> > >> >>>> might be
> > >> >> > >> >>>> > >> > > overhead
> > >> >> > >> >>>> > >> > > > > time of these clusters making your total MR
> > jobs
> > >> >> more
> > >> >> > >> time
> > >> >> > >> >>>> > >> consuming.
> > >> >> > >> >>>> > >> > > > > I guess you will have to try with larger set
> > of
> > >> >> data..
> > >> >> > >> >>>> > >> > > > >
> > >> >> > >> >>>> > >> > > > > Pankil
> > >> >> > >> >>>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila
> > >> Nagendra <
> > >> >> > >> >>>> > >> [email protected]>
> > >> >> > >> >>>> > >> > > > > wrote:
> > >> >> > >> >>>> > >> > > > >
> > >> >> > >> >>>> > >> > > > > > Aaron
> > >> >> > >> >>>> > >> > > > > >
> > >> >> > >> >>>> > >> > > > > > That could be the issue, my data is just
> > 516MB
> > >> -
> > >> >> > >> wouldn't
> > >> >> > >> >>>> this
> > >> >> > >> >>>> > >> see
> > >> >> > >> >>>> > >> > a
> > >> >> > >> >>>> > >> > > > bit
> > >> >> > >> >>>> > >> > > > > of
> > >> >> > >> >>>> > >> > > > > > speed up?
> > >> >> > >> >>>> > >> > > > > > Could you guide me to the example? I ll
> run
> > my
> > >> >> > cluster
> > >> >> > >> on
> > >> >> > >> >>>> it
> > >> >> > >> >>>> > and
> > >> >> > >> >>>> > >> > see
> > >> >> > >> >>>> > >> > > > what
> > >> >> > >> >>>> > >> > > > > I
> > >> >> > >> >>>> > >> > > > > > get. Also for my program I had a java
> timer
> > >> >> running
> > >> >> > to
> > >> >> > >> >>>> record
> > >> >> > >> >>>> > >> the
> > >> >> > >> >>>> > >> > > time
> > >> >> > >> >>>> > >> > > > > > taken
> > >> >> > >> >>>> > >> > > > > > to complete execution. Does Hadoop have an
> > >> >> inbuilt
> > >> >> > >> timer?
> > >> >> > >> >>>> > >> > > > > >
> > >> >> > >> >>>> > >> > > > > > Mithila
> > >> >> > >> >>>> > >> > > > > >
> > >> >> > >> >>>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron
> > Kimball
> > >> <
> > >> >> > >> >>>> > >> [email protected]
> > >> >> > >> >>>> > >> > >
> > >> >> > >> >>>> > >> > > > > wrote:
> > >> >> > >> >>>> > >> > > > > >
> > >> >> > >> >>>> > >> > > > > > > Virtually none of the examples that ship
> > >> with
> > >> >> > Hadoop
> > >> >> > >> >>>> are
> > >> >> > >> >>>> > >> designed
> > >> >> > >> >>>> > >> > > to
> > >> >> > >> >>>> > >> > > > > > > showcase its speed. Hadoop's speedup
> comes
> > >> from
> > >> >> > its
> > >> >> > >> >>>> ability
> > >> >> > >> >>>> > to
> > >> >> > >> >>>> > >> > > > process
> > >> >> > >> >>>> > >> > > > > > very
> > >> >> > >> >>>> > >> > > > > > > large volumes of data (starting around,
> > say,
> > >> >> tens
> > >> >> > of
> > >> >> > >> GB
> > >> >> > >> >>>> per
> > >> >> > >> >>>> > >> job,
> > >> >> > >> >>>> > >> > > and
> > >> >> > >> >>>> > >> > > > > > going
> > >> >> > >> >>>> > >> > > > > > > up in orders of magnitude from there).
> So
> > if
> > >> >> you
> > >> >> > are
> > >> >> > >> >>>> timing
> > >> >> > >> >>>> > >> the
> > >> >> > >> >>>> > >> > pi
> > >> >> > >> >>>> > >> > > > > > > calculator (or something like that), its
> > >> >> results
> > >> >> > >> won't
> > >> >> > >> >>>> > >> > necessarily
> > >> >> > >> >>>> > >> > > be
> > >> >> > >> >>>> > >> > > > > > very
> > >> >> > >> >>>> > >> > > > > > > consistent. If a job doesn't have enough
> > >> >> fragments
> > >> >> > >> of
> > >> >> > >> >>>> data
> > >> >> > >> >>>> > to
> > >> >> > >> >>>> > >> > > > allocate
> > >> >> > >> >>>> > >> > > > > > one
> > >> >> > >> >>>> > >> > > > > > > per each node, some of the nodes will
> also
> > >> just
> > >> >> go
> > >> >> > >> >>>> unused.
> > >> >> > >> >>>> > >> > > > > > >
> > >> >> > >> >>>> > >> > > > > > > The best example for you to run is to
> use
> > >> >> > >> randomwriter
> > >> >> > >> >>>> to
> > >> >> > >> >>>> > fill
> > >> >> > >> >>>> > >> up
> > >> >> > >> >>>> > >> > > > your
> > >> >> > >> >>>> > >> > > > > > > cluster with several GB of random data
> and
> > >> then
> > >> >> > run
> > >> >> > >> the
> > >> >> > >> >>>> sort
> > >> >> > >> >>>> > >> > > program.
> > >> >> > >> >>>> > >> > > > > If
> > >> >> > >> >>>> > >> > > > > > > that doesn't scale up performance from 3
> > >> nodes
> > >> >> to
> > >> >> > >> 15,
> > >> >> > >> >>>> then
> > >> >> > >> >>>> > >> you've
> > >> >> > >> >>>> > >> > > > > > > definitely
> > >> >> > >> >>>> > >> > > > > > > got something strange going on.
> > >> >> > >> >>>> > >> > > > > > >
> > >> >> > >> >>>> > >> > > > > > > - Aaron
> > >> >> > >> >>>> > >> > > > > > >
> > >> >> > >> >>>> > >> > > > > > >
> > >> >> > >> >>>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila
> > >> >> Nagendra
> > >> >> > <
> > >> >> > >> >>>> > >> > > [email protected]>
> > >> >> > >> >>>> > >> > > > > > > wrote:
> > >> >> > >> >>>> > >> > > > > > >
> > >> >> > >> >>>> > >> > > > > > > > Hey all
> > >> >> > >> >>>> > >> > > > > > > > I recently setup a three node hadoop
> > >> cluster
> > >> >> and
> > >> >> > >> ran
> > >> >> > >> >>>> an
> > >> >> > >> >>>> > >> > examples
> > >> >> > >> >>>> > >> > > on
> > >> >> > >> >>>> > >> > > > > it.
> > >> >> > >> >>>> > >> > > > > > > It
> > >> >> > >> >>>> > >> > > > > > > > was pretty fast, and all the three
> nodes
> > >> were
> > >> >> > >> being
> > >> >> > >> >>>> used
> > >> >> > >> >>>> > (I
> > >> >> > >> >>>> > >> > > checked
> > >> >> > >> >>>> > >> > > > > the
> > >> >> > >> >>>> > >> > > > > > > log
> > >> >> > >> >>>> > >> > > > > > > > files to make sure that the slaves are
> > >> >> > utilized).
> > >> >> > >> >>>> > >> > > > > > > >
> > >> >> > >> >>>> > >> > > > > > > > Now I ve setup another cluster
> > consisting
> > >> of
> > >> >> 15
> > >> >> > >> >>>> nodes. I
> > >> >> > >> >>>> > ran
> > >> >> > >> >>>> > >> > the
> > >> >> > >> >>>> > >> > > > same
> > >> >> > >> >>>> > >> > > > > > > > example, but instead of speeding up,
> the
> > >> >> > >> map-reduce
> > >> >> > >> >>>> task
> > >> >> > >> >>>> > >> seems
> > >> >> > >> >>>> > >> > to
> > >> >> > >> >>>> > >> > > > > take
> > >> >> > >> >>>> > >> > > > > > > > forever! The slaves are not being used
> > for
> > >> >> some
> > >> >> > >> >>>> reason.
> > >> >> > >> >>>> > This
> > >> >> > >> >>>> > >> > > second
> > >> >> > >> >>>> > >> > > > > > > cluster
> > >> >> > >> >>>> > >> > > > > > > > has a lower, per node processing
> power,
> > >> but
> > >> >> > should
> > >> >> > >> >>>> that
> > >> >> > >> >>>> > make
> > >> >> > >> >>>> > >> > any
> > >> >> > >> >>>> > >> > > > > > > > difference?
> > >> >> > >> >>>> > >> > > > > > > > How can I ensure that the data is
> being
> > >> >> mapped
> > >> >> > to
> > >> >> > >> all
> > >> >> > >> >>>> the
> > >> >> > >> >>>> > >> > nodes?
> > >> >> > >> >>>> > >> > > > > > > Presently,
> > >> >> > >> >>>> > >> > > > > > > > the only node that seems to be doing
> all
> > >> the
> > >> >> > work
> > >> >> > >> is
> > >> >> > >> >>>> the
> > >> >> > >> >>>> > >> Master
> > >> >> > >> >>>> > >> > > > node.
> > >> >> > >> >>>> > >> > > > > > > >
> > >> >> > >> >>>> > >> > > > > > > > Does 15 nodes in a cluster increase
> the
> > >> >> network
> > >> >> > >> cost?
> > >> >> > >> >>>> What
> > >> >> > >> >>>> > >> can
> > >> >> > >> >>>> > >> > I
> > >> >> > >> >>>> > >> > > do
> > >> >> > >> >>>> > >> > > > > to
> > >> >> > >> >>>> > >> > > > > > > > setup
> > >> >> > >> >>>> > >> > > > > > > > the cluster to function more
> > efficiently?
> > >> >> > >> >>>> > >> > > > > > > >
> > >> >> > >> >>>> > >> > > > > > > > Thanks!
> > >> >> > >> >>>> > >> > > > > > > > Mithila Nagendra
> > >> >> > >> >>>> > >> > > > > > > > Arizona State University
> > >> >> > >> >>>> > >> > > > > > > >
> > >> >> > >> >>>> > >> > > > > > >
> > >> >> > >> >>>> > >> > > > > >
> > >> >> > >> >>>> > >> > > > >
> > >> >> > >> >>>> > >> > > >
> > >> >> > >> >>>> > >> > >
> > >> >> > >> >>>> > >> >
> > >> >> > >> >>>> > >>
> > >> >> > >> >>>> > >
> > >> >> > >> >>>> > >
> > >> >> > >> >>>> >
> > >> >> > >> >>>>
> > >> >> > >> >>>
> > >> >> > >> >>>
> > >> >> > >> >>
> > >> >> > >> >
> > >> >> > >>
> > >> >> > >>
> > >> >> > >> Ravi
> > >> >> > >> --
> > >> >> > >>
> > >> >> > >>
> > >> >> > >
> > >> >> >
> > >> >>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Alpha Chapters of my book on Hadoop are available
> > >> >> http://www.apress.com/book/view/9781430219422
> > >> >>
> > >> >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > > Alpha Chapters of my book on Hadoop are available
> > > http://www.apress.com/book/view/9781430219422
> > >
> >
> >
> >
> > --
> > Alpha Chapters of my book on Hadoop are available
> > http://www.apress.com/book/view/9781430219422
> >
>



-- 
Alpha Chapters of my book on Hadoop are available
http://www.apress.com/book/view/9781430219422

Re: Map-Reduce Slow Down

Reply via email to