Hi, try following changes on your /etc/hosts on slave,
your /etc/hosts of slave: /etc/hosts on slave is (say cp-lab) : 127.0.0.1 cp-lab localhost.localdomain localhost 127.0.1.1 cp-desktop 10.14.11.32 Abhishek-Master 10.14.13.18 manjeet-home manjeet-home.localdomain 10.105.18.1 vadehra vadehra.localdomain Change it to following at each slave and then try again, 127.0.0.1 localhost.localdomain localhost # Removed 'cp-lab' from here, same hostanme with two distinct ips create confusion 127.0.1.1 cp-desktop 10.14.11.32 Abhishek-Master 10.14.13.18 manjeet-home manjeet-home.localdomain 10.105.18.1 vadehra vadehra.localdomain 10.129.26.215 cp-lab cp-lab.localdomain # Add self entry at slave Also, 1. hostnames are case sensitive for hadoop, so hostnames in your master and slave config files should reflect exact hostnames in /etc/hosts and hostnames of participating nodes(set/obtained by linux hostname command). 2. Check whether slave nodes can access each other(ping should work). -ajit On Sun, Aug 14, 2011 at 8:05 PM, sachinites <sachini...@gmail.com> wrote: > > hello frnds . I am Mtech students at IIT Bombay , and i am working project > on > hadoop . When i launch the job , MAp phase with one master and three slave > nodes (Master isnt a lave node itself) , Map phase runs to completion > successfully , but in reduce phase , it runs to about 16% completion , then > it fails and throws shuffle error . Forms shows that , this error is arises > when one slave running reducer try to fetch the Map-output from another > slave node which runs the Mapper . The problem is that the Reducer slave > isnt able to resolve the hostname of the reducer slave . This causes the > Reducer slave to thorow shuffle error example . The problem is more about > setting in /etc/hosts file . The terminal output is below : > > 11/08/14 19:35:32 INFO HadoopSweepLine: Launching the job. > 11/08/14 19:35:32 WARN mapred.JobClient: Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 11/08/14 19:35:32 INFO mapred.FileInputFormat: Total input paths to process > : 1 > 11/08/14 19:35:33 INFO mapred.JobClient: Running job: job_201108141930_0002 > 11/08/14 19:35:34 INFO mapred.JobClient: map 0% reduce 0% > 11/08/14 19:35:44 INFO mapred.JobClient: map 50% reduce 0% > 11/08/14 19:35:47 INFO mapred.JobClient: map 100% reduce 0% > 11/08/14 19:35:53 INFO mapred.JobClient: map 100% reduce 8% > 11/08/14 19:35:59 INFO mapred.JobClient: map 100% reduce 0% > 11/08/14 19:36:01 INFO mapred.JobClient: Task Id : > attempt_201108141930_0002_r_000000_0, Status : FAILED > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 11/08/14 19:36:01 WARN mapred.JobClient: Error reading task > outputgrc1-desktop > 11/08/14 19:36:01 WARN mapred.JobClient: Error reading task > outputgrc1-desktop > 11/08/14 19:36:03 INFO mapred.JobClient: Task Id : > attempt_201108141930_0002_r_000001_0, Status : FAILED > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 11/08/14 19:36:03 WARN mapred.JobClient: Error reading task > outputcp-desktop > 11/08/14 19:36:03 WARN mapred.JobClient: Error reading task > outputcp-desktop > 11/08/14 19:36:13 INFO mapred.JobClient: map 100% reduce 8% > 11/08/14 19:36:16 INFO mapred.JobClient: map 100% reduce 0% > 11/08/14 19:36:18 INFO mapred.JobClient: Task Id : > attempt_201108141930_0002_r_000000_1, Status : FAILED > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 11/08/14 19:36:18 WARN mapred.JobClient: Error reading task > outputcp-desktop > 11/08/14 19:36:18 WARN mapred.JobClient: Error reading task > outputcp-desktop > 11/08/14 19:36:18 INFO mapred.JobClient: Task Id : > attempt_201108141930_0002_r_000001_1, Status : FAILED > Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out. > 11/08/14 19:36:18 WARN mapred.JobClient: Error reading task outputdove > 11/08/14 19:36:18 WARN mapred.JobClient: Error reading task outputdove > ..... Continue and job fails . > > Also, the job gets successfully completed with exactly one Slave Machine , > bcoz the communication is between namenode & a slve node only , no > slave-slave Communication . > > It shall be great help if anyone running hadoop (0.20.1) on ubuntu with > multiple datanodes (not in pseudo-disrtibuted mode) can post the conent of > his /etc/hosts file of both the Master & slaves . It shall be a great help > for me . > > My /etc/hosts on Master is : > > 127.0.0.1 localhost.localdomain localhost > 127.0.1.1 ubuntu > 10.14.11.32 Abhishek-Master <<- Master node > 10.14.13.18 manjeet-home manjeet-home.localdomain (slave) > 10.129.26.215 cp-lab cp-lab.localdomain (slave) > 10.105.18.1 vadehra vadehra.localdomain (slave) > > # The following lines are desirable for IPv6 capable hosts > ::1 localhost ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > ff02::3 ip6-allhosts > > ------------- > > /etc/hosts on slave is (say cp-lab) : > > 127.0.0.1 cp-lab localhost.localdomain localhost > 127.0.1.1 cp-desktop > 10.14.11.32 Abhishek-Master > 10.14.13.18 manjeet-home manjeet-home.localdomain > 10.105.18.1 vadehra vadehra.localdomain > > # The following lines are desirable for IPv6 capable hosts > ::1 ip6-localhost ip6-loopback > fe00::0 ip6-localnet > ff00::0 ip6-mcastprefix > ff02::1 ip6-allnodes > ff02::2 ip6-allrouters > ----------------------------------------------- > Can somebody running a hadoop cluster send me his /etc/hosts file of both > Master and slave . I shall be grateful . > My fcaebook : www.facebook.com/abhishek004hbti , email : > sachini...@gmail.com > > > > > Please somebosy help me why Reducer Slaves are not able fetch the Mapout > data from MApper slaves . > any Help shall be appreciated . > Thanks & regards > -- > View this message in context: > http://old.nabble.com/Shuuling-Error-in-Reduce-Phase-tp32259596p32259596.html > Sent from the Hadoop core-dev mailing list archive at Nabble.com. > >