Re: Map-Reduce Slow Down

Mithila Nagendra Tue, 14 Apr 2009 12:23:30 -0700

Also, Would the way the port is accessed change if all these node are
connected through a gateway? I mean in the hadoop-site.xml file? The Ubuntu
systems we worked with earlier didnt have a gateway.
Mithila


On Tue, Apr 14, 2009 at 9:48 PM, Mithila Nagendra <[email protected]> wrote:

> Aaron: Which log file do I look into - there are alot of them. Here s what
> the error looks like:
> [mith...@node19:~]$ cd hadoop
> [mith...@node19:~/hadoop]$ bin/hadoop dfs -ls
> 09/04/14 10:09:29 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 0 time(s).
> 09/04/14 10:09:30 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 1 time(s).
> 09/04/14 10:09:31 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 2 time(s).
> 09/04/14 10:09:32 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 3 time(s).
> 09/04/14 10:09:33 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 4 time(s).
> 09/04/14 10:09:34 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 5 time(s).
> 09/04/14 10:09:35 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 6 time(s).
> 09/04/14 10:09:36 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 7 time(s).
> 09/04/14 10:09:37 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 8 time(s).
> 09/04/14 10:09:38 INFO ipc.Client: Retrying connect to server: node18/
> 192.168.0.18:54310. Already tried 9 time(s).
> Bad connection to FS. command aborted.
>
> Node19 is a slave and Node18 is the master.
>
> Mithila
>
>
>
> On Tue, Apr 14, 2009 at 8:53 PM, Aaron Kimball <[email protected]> wrote:
>
>> Are there any error messages in the log files on those nodes?
>> - Aaron
>>
>> On Tue, Apr 14, 2009 at 9:03 AM, Mithila Nagendra <[email protected]>
>> wrote:
>>
>> > I ve drawn a blank here! Can't figure out what s wrong with the ports. I
>> > can
>> > ssh between the nodes but cant access the DFS from the slaves - says
>> "Bad
>> > connection to DFS". Master seems to be fine.
>> > Mithila
>> >
>> > On Tue, Apr 14, 2009 at 4:28 AM, Mithila Nagendra <[email protected]>
>> > wrote:
>> >
>> > > Yes I can..
>> > >
>> > >
>> > > On Mon, Apr 13, 2009 at 5:12 PM, Jim Twensky <[email protected]
>> > >wrote:
>> > >
>> > >> Can you ssh between the nodes?
>> > >>
>> > >> -jim
>> > >>
>> > >> On Mon, Apr 13, 2009 at 6:49 PM, Mithila Nagendra <[email protected]>
>> > >> wrote:
>> > >>
>> > >> > Thanks Aaron.
>> > >> > Jim: The three clusters I setup had ubuntu running on them and the
>> dfs
>> > >> was
>> > >> > accessed at port 54310. The new cluster which I ve setup has Red
>> Hat
>> > >> Linux
>> > >> > release 7.2 (Enigma)running on it. Now when I try to access the dfs
>> > from
>> > >> > one
>> > >> > of the slaves i get the following response: dfs cannot be accessed.
>> > When
>> > >> I
>> > >> > access the DFS throught the master there s no problem. So I feel
>> there
>> > a
>> > >> > problem with the port. Any ideas? I did check the list of slaves,
>> it
>> > >> looks
>> > >> > fine to me.
>> > >> >
>> > >> > Mithila
>> > >> >
>> > >> >
>> > >> >
>> > >> >
>> > >> > On Mon, Apr 13, 2009 at 2:58 PM, Jim Twensky <
>> [email protected]>
>> > >> > wrote:
>> > >> >
>> > >> > > Mithila,
>> > >> > >
>> > >> > > You said all the slaves were being utilized in the 3 node
>> cluster.
>> > >> Which
>> > >> > > application did you run to test that and what was your input
>> size?
>> > If
>> > >> you
>> > >> > > tried the word count application on a 516 MB input file on both
>> > >> cluster
>> > >> > > setups, than some of your nodes in the 15 node cluster may not be
>> > >> running
>> > >> > > at
>> > >> > > all. Generally, one map job is assigned to each input split and
>> if
>> > you
>> > >> > are
>> > >> > > running your cluster with the defaults, the splits are 64 MB
>> each. I
>> > >> got
>> > >> > > confused when you said the Namenode seemed to do all the work.
>> Can
>> > you
>> > >> > > check
>> > >> > > conf/slaves and make sure you put the names of all task trackers
>> > >> there? I
>> > >> > > also suggest comparing both clusters with a larger input size,
>> say
>> > at
>> > >> > least
>> > >> > > 5 GB, to really see a difference.
>> > >> > >
>> > >> > > Jim
>> > >> > >
>> > >> > > On Mon, Apr 13, 2009 at 4:17 PM, Aaron Kimball <
>> [email protected]>
>> > >> > wrote:
>> > >> > >
>> > >> > > > in hadoop-*-examples.jar, use "randomwriter" to generate the
>> data
>> > >> and
>> > >> > > > "sort"
>> > >> > > > to sort it.
>> > >> > > > - Aaron
>> > >> > > >
>> > >> > > > On Sun, Apr 12, 2009 at 9:33 PM, Pankil Doshi <
>> > [email protected]>
>> > >> > > wrote:
>> > >> > > >
>> > >> > > > > Your data is too small I guess for 15 clusters ..So it might
>> be
>> > >> > > overhead
>> > >> > > > > time of these clusters making your total MR jobs more time
>> > >> consuming.
>> > >> > > > > I guess you will have to try with larger set of data..
>> > >> > > > >
>> > >> > > > > Pankil
>> > >> > > > > On Sun, Apr 12, 2009 at 6:54 PM, Mithila Nagendra <
>> > >> [email protected]>
>> > >> > > > > wrote:
>> > >> > > > >
>> > >> > > > > > Aaron
>> > >> > > > > >
>> > >> > > > > > That could be the issue, my data is just 516MB - wouldn't
>> this
>> > >> see
>> > >> > a
>> > >> > > > bit
>> > >> > > > > of
>> > >> > > > > > speed up?
>> > >> > > > > > Could you guide me to the example? I ll run my cluster on
>> it
>> > and
>> > >> > see
>> > >> > > > what
>> > >> > > > > I
>> > >> > > > > > get. Also for my program I had a java timer running to
>> record
>> > >> the
>> > >> > > time
>> > >> > > > > > taken
>> > >> > > > > > to complete execution. Does Hadoop have an inbuilt timer?
>> > >> > > > > >
>> > >> > > > > > Mithila
>> > >> > > > > >
>> > >> > > > > > On Mon, Apr 13, 2009 at 1:13 AM, Aaron Kimball <
>> > >> [email protected]
>> > >> > >
>> > >> > > > > wrote:
>> > >> > > > > >
>> > >> > > > > > > Virtually none of the examples that ship with Hadoop are
>> > >> designed
>> > >> > > to
>> > >> > > > > > > showcase its speed. Hadoop's speedup comes from its
>> ability
>> > to
>> > >> > > > process
>> > >> > > > > > very
>> > >> > > > > > > large volumes of data (starting around, say, tens of GB
>> per
>> > >> job,
>> > >> > > and
>> > >> > > > > > going
>> > >> > > > > > > up in orders of magnitude from there). So if you are
>> timing
>> > >> the
>> > >> > pi
>> > >> > > > > > > calculator (or something like that), its results won't
>> > >> > necessarily
>> > >> > > be
>> > >> > > > > > very
>> > >> > > > > > > consistent. If a job doesn't have enough fragments of
>> data
>> > to
>> > >> > > > allocate
>> > >> > > > > > one
>> > >> > > > > > > per each node, some of the nodes will also just go
>> unused.
>> > >> > > > > > >
>> > >> > > > > > > The best example for you to run is to use randomwriter to
>> > fill
>> > >> up
>> > >> > > > your
>> > >> > > > > > > cluster with several GB of random data and then run the
>> sort
>> > >> > > program.
>> > >> > > > > If
>> > >> > > > > > > that doesn't scale up performance from 3 nodes to 15,
>> then
>> > >> you've
>> > >> > > > > > > definitely
>> > >> > > > > > > got something strange going on.
>> > >> > > > > > >
>> > >> > > > > > > - Aaron
>> > >> > > > > > >
>> > >> > > > > > >
>> > >> > > > > > > On Sun, Apr 12, 2009 at 8:39 AM, Mithila Nagendra <
>> > >> > > [email protected]>
>> > >> > > > > > > wrote:
>> > >> > > > > > >
>> > >> > > > > > > > Hey all
>> > >> > > > > > > > I recently setup a three node hadoop cluster and ran an
>> > >> > examples
>> > >> > > on
>> > >> > > > > it.
>> > >> > > > > > > It
>> > >> > > > > > > > was pretty fast, and all the three nodes were being
>> used
>> > (I
>> > >> > > checked
>> > >> > > > > the
>> > >> > > > > > > log
>> > >> > > > > > > > files to make sure that the slaves are utilized).
>> > >> > > > > > > >
>> > >> > > > > > > > Now I ve setup another cluster consisting of 15 nodes.
>> I
>> > ran
>> > >> > the
>> > >> > > > same
>> > >> > > > > > > > example, but instead of speeding up, the map-reduce
>> task
>> > >> seems
>> > >> > to
>> > >> > > > > take
>> > >> > > > > > > > forever! The slaves are not being used for some reason.
>> > This
>> > >> > > second
>> > >> > > > > > > cluster
>> > >> > > > > > > > has a lower, per node processing power, but should that
>> > make
>> > >> > any
>> > >> > > > > > > > difference?
>> > >> > > > > > > > How can I ensure that the data is being mapped to all
>> the
>> > >> > nodes?
>> > >> > > > > > > Presently,
>> > >> > > > > > > > the only node that seems to be doing all the work is
>> the
>> > >> Master
>> > >> > > > node.
>> > >> > > > > > > >
>> > >> > > > > > > > Does 15 nodes in a cluster increase the network cost?
>> What
>> > >> can
>> > >> > I
>> > >> > > do
>> > >> > > > > to
>> > >> > > > > > > > setup
>> > >> > > > > > > > the cluster to function more efficiently?
>> > >> > > > > > > >
>> > >> > > > > > > > Thanks!
>> > >> > > > > > > > Mithila Nagendra
>> > >> > > > > > > > Arizona State University
>> > >> > > > > > > >
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Re: Map-Reduce Slow Down

Reply via email to