Re: Shuuling Error in Reduce Phase

Ajit Ratnaparkhi Mon, 15 Aug 2011 01:34:04 -0700

Hi,

try following changes on your /etc/hosts on slave,


your /etc/hosts of slave:
/etc/hosts on slave is (say cp-lab) :

127.0.0.1       cp-lab localhost.localdomain localhost
127.0.1.1       cp-desktop
10.14.11.32     Abhishek-Master
10.14.13.18     manjeet-home manjeet-home.localdomain
10.105.18.1     vadehra vadehra.localdomain


Change it to following at each slave and then try again,
127.0.0.1       localhost.localdomain localhost    # Removed 'cp-lab' from
here, same hostanme with two distinct ips create confusion
127.0.1.1       cp-desktop
10.14.11.32     Abhishek-Master
10.14.13.18     manjeet-home manjeet-home.localdomain
10.105.18.1     vadehra vadehra.localdomain
10.129.26.215   cp-lab cp-lab.localdomain              # Add self entry at
slave

Also,
1. hostnames are case sensitive for hadoop, so hostnames in your master and
slave config files should reflect exact hostnames in /etc/hosts and
hostnames of participating nodes(set/obtained by linux hostname command).
2. Check whether slave nodes can access each other(ping should work).

-ajit

On Sun, Aug 14, 2011 at 8:05 PM, sachinites <sachini...@gmail.com> wrote:

>
> hello frnds . I am Mtech students at IIT Bombay , and i am working project
> on
> hadoop . When i launch the job , MAp phase with one master and three slave
> nodes (Master isnt a lave node itself) , Map phase runs to completion
> successfully , but in reduce phase , it runs to about 16% completion , then
> it fails and throws shuffle error . Forms shows that , this error is arises
> when one slave running reducer try to fetch the Map-output from another
> slave node which runs the Mapper . The problem is that the Reducer slave
> isnt able to resolve the hostname of the reducer slave . This causes the
> Reducer slave to thorow shuffle error example . The problem is more about
> setting in /etc/hosts file . The terminal output is below :
>
> 11/08/14 19:35:32 INFO HadoopSweepLine: Launching the job.
> 11/08/14 19:35:32 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 11/08/14 19:35:32 INFO mapred.FileInputFormat: Total input paths to process
> : 1
> 11/08/14 19:35:33 INFO mapred.JobClient: Running job: job_201108141930_0002
> 11/08/14 19:35:34 INFO mapred.JobClient:  map 0% reduce 0%
> 11/08/14 19:35:44 INFO mapred.JobClient:  map 50% reduce 0%
> 11/08/14 19:35:47 INFO mapred.JobClient:  map 100% reduce 0%
> 11/08/14 19:35:53 INFO mapred.JobClient:  map 100% reduce 8%
> 11/08/14 19:35:59 INFO mapred.JobClient:  map 100% reduce 0%
> 11/08/14 19:36:01 INFO mapred.JobClient: Task Id :
> attempt_201108141930_0002_r_000000_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 11/08/14 19:36:01 WARN mapred.JobClient: Error reading task
> outputgrc1-desktop
> 11/08/14 19:36:01 WARN mapred.JobClient: Error reading task
> outputgrc1-desktop
> 11/08/14 19:36:03 INFO mapred.JobClient: Task Id :
> attempt_201108141930_0002_r_000001_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 11/08/14 19:36:03 WARN mapred.JobClient: Error reading task
> outputcp-desktop
> 11/08/14 19:36:03 WARN mapred.JobClient: Error reading task
> outputcp-desktop
> 11/08/14 19:36:13 INFO mapred.JobClient:  map 100% reduce 8%
> 11/08/14 19:36:16 INFO mapred.JobClient:  map 100% reduce 0%
> 11/08/14 19:36:18 INFO mapred.JobClient: Task Id :
> attempt_201108141930_0002_r_000000_1, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 11/08/14 19:36:18 WARN mapred.JobClient: Error reading task
> outputcp-desktop
> 11/08/14 19:36:18 WARN mapred.JobClient: Error reading task
> outputcp-desktop
> 11/08/14 19:36:18 INFO mapred.JobClient: Task Id :
> attempt_201108141930_0002_r_000001_1, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 11/08/14 19:36:18 WARN mapred.JobClient: Error reading task outputdove
> 11/08/14 19:36:18 WARN mapred.JobClient: Error reading task outputdove
> ..... Continue and job fails .
>
> Also, the job gets successfully completed with exactly one Slave Machine ,
> bcoz the communication is between namenode & a slve node only , no
> slave-slave Communication .
>
> It shall be great help if anyone running hadoop (0.20.1) on ubuntu with
> multiple datanodes (not in pseudo-disrtibuted mode) can post the conent of
> his /etc/hosts file of both the Master & slaves . It shall be a great help
> for me .
>
> My /etc/hosts on Master is  :
>
> 127.0.0.1       localhost.localdomain localhost
> 127.0.1.1       ubuntu
> 10.14.11.32     Abhishek-Master    <<- Master node
> 10.14.13.18     manjeet-home manjeet-home.localdomain   (slave)
> 10.129.26.215   cp-lab cp-lab.localdomain   (slave)
> 10.105.18.1     vadehra vadehra.localdomain  (slave)
>
> # The following lines are desirable for IPv6 capable hosts
> ::1     localhost ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> ff02::3 ip6-allhosts
>
> -------------
>
> /etc/hosts on slave is (say cp-lab) :
>
> 127.0.0.1       cp-lab localhost.localdomain localhost
> 127.0.1.1       cp-desktop
> 10.14.11.32     Abhishek-Master
> 10.14.13.18     manjeet-home manjeet-home.localdomain
> 10.105.18.1     vadehra vadehra.localdomain
>
> # The following lines are desirable for IPv6 capable hosts
> ::1     ip6-localhost ip6-loopback
> fe00::0 ip6-localnet
> ff00::0 ip6-mcastprefix
> ff02::1 ip6-allnodes
> ff02::2 ip6-allrouters
> -----------------------------------------------
> Can somebody running a hadoop cluster send me his /etc/hosts file of both
> Master and slave . I shall be grateful .
>  My fcaebook : www.facebook.com/abhishek004hbti  , email :
> sachini...@gmail.com
>
>
>
>
> Please somebosy help me why Reducer Slaves are not able fetch the Mapout
> data from MApper slaves .
> any Help shall be appreciated .
> Thanks & regards
> --
> View this message in context:
> http://old.nabble.com/Shuuling-Error-in-Reduce-Phase-tp32259596p32259596.html
> Sent from the Hadoop core-dev mailing list archive at Nabble.com.
>
>

Re: Shuuling Error in Reduce Phase

Reply via email to