Hi The input is a plain text file. I use the parameters specified in the input file to launch a process on each machine and then collect the results back. I am not using cached files. Everything needed is contained in the job jar file. Each map task is supposed to finish within one minute.
Here's the output from the reduce phase, where things get stuck: Running Hadoop in Pseudo-distributed mode. [code] 2009-09-30 06:27:38,601 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Need another 2 map output(s) where 0 is already in progress 2009-09-30 06:27:38,603 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0: Got 0 new map-outputs 2009-09-30 06:27:38,603 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2009-09-30 06:28:33,623 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0: Got 1 new map-outputs 2009-09-30 06:28:33,624 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Scheduled 1 outputs (0 slow hosts and0 dup hosts) 2009-09-30 06:28:33,628 INFO org.apache.hadoop.mapred.ReduceTask: Shuffling 2 bytes (6 raw bytes) into RAM from attempt_200909292242_0017_m_000007_0 2009-09-30 06:28:33,628 INFO org.apache.hadoop.mapred.ReduceTask: Read 2 bytes from map-output for attempt_200909292242_0017_m_000007_0 2009-09-30 06:28:33,629 INFO org.apache.hadoop.mapred.ReduceTask: Rec #1 from attempt_200909292242_0017_m_000007_0 -> (-1, -1) from pc01 2009-09-30 06:28:40,624 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Need another 1 map output(s) where 0 is already in progress 2009-09-30 06:28:40,625 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0: Got 0 new map-outputs 2009-09-30 06:28:40,626 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2009-09-30 06:29:40,639 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Need another 1 map output(s) where 0 is already in progress 2009-09-30 06:29:40,640 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0: Got 0 new map-outputs 2009-09-30 06:29:40,641 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2009-09-30 06:30:40,655 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Need another 1 map output(s) where 0 is already in progress 2009-09-30 06:30:40,657 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0: Got 0 new map-outputs 2009-09-30 06:30:40,657 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2009-09-30 06:31:40,677 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Need another 1 map output(s) where 0 is already in progress 2009-09-30 06:31:40,679 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0: Got 0 new map-outputs 2009-09-30 06:31:40,679 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2009-09-30 06:32:40,692 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Need another 1 map output(s) where 0 is already in progress 2009-09-30 06:32:40,693 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0: Got 0 new map-outputs 2009-09-30 06:32:40,694 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2009-09-30 06:33:40,708 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Need another 1 map output(s) where 0 is already in progress 2009-09-30 06:33:40,710 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0: Got 0 new map-outputs 2009-09-30 06:33:40,710 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2009-09-30 06:34:40,731 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Need another 1 map output(s) where 0 is already in progress 2009-09-30 06:34:40,733 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0: Got 0 new map-outputs 2009-09-30 06:34:40,733 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Scheduled 0 outputs (0 slow hosts and0 dup hosts) 2009-09-30 06:35:40,753 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0 Need another 1 map output(s) where 0 is already in progress 2009-09-30 06:35:40,755 INFO org.apache.hadoop.mapred.ReduceTask: attempt_200909292242_0017_r_000000_0: Got 1 new map-outputs [/code] Amogh Vasekar-2 wrote: > > Hi, > Can you provide info on the input like compression etc? Also, are you > using cached files in your map tasks? It might be helpful if you paste the > logs here after blanking your system specific info., as then one can find > out where till the reduce it went or if the copy phase started at all. > > Thanks, > Amogh > > -----Original Message----- > From: achilles852 [mailto:faheemk...@gmail.com] > Sent: Wednesday, September 30, 2009 6:38 AM > To: core-...@hadoop.apache.org > Subject: Re: last map task taking too long > > > Basically, it finishes what it is supposed to do (I view the logs to find > out), but does not move onto the reduce stage. > > > Ted Dunning wrote: >> >> Is that last map task actually running, or is it pending? >> >> On Tue, Sep 29, 2009 at 5:57 PM, achilles852 <faheemk...@gmail.com> >> wrote: >> >>> >>> Hey.. I am trying to write a small mapreduce program. I launch a few map >>> tasks, each of which should complete within a certain time (say 5 >>> minutes)... all the tasks complete within 5 minutes except the last one >>> - >>> which takes around 10 times more the time taken by all other map >>> tasks.....any idea why this is happening? >>> >>> I am using Hadoop version 0.19.2, tried running it locally as well as on >>> EC2. >>> -- >>> View this message in context: >>> http://www.nabble.com/last-map-task-taking-too-long-tp25673359p25673359.html >>> Sent from the Hadoop core-dev mailing list archive at Nabble.com. >>> >>> >> >> >> -- >> Ted Dunning, CTO >> DeepDyve >> >> > > -- > View this message in context: > http://www.nabble.com/last-map-task-taking-too-long-tp25673359p25673431.html > Sent from the Hadoop core-dev mailing list archive at Nabble.com. > > > -- View this message in context: http://www.nabble.com/last-map-task-taking-too-long-tp25673359p25675439.html Sent from the Hadoop core-dev mailing list archive at Nabble.com.