did you look for network-related errors in the logs. i've found instances when a VPN connection confused networking in hadoop. (could be a JVM issue, didn't spend the time to know for sure)
thanks, thushara On Wed, Sep 30, 2009 at 1:46 AM, achilles852 <faheemk...@gmail.com> wrote: > > Running Hadoop on EC2 using 32 bit small instances with 5 slaves. The job > has > 5 map tasks and 1 reduce task. Each map task is 5 minutes long. > After 5 minutes and 14 seconds, 4 map tasks completed but the last one is > now stalling the entire process. > > Here are the logs for JobTracker: > ---------------------------------------- > [start of JobTracker Logs] > ---------------------------------------- > 2009-09-30 04:31:24,468 INFO org.apache.hadoop.mapred.JobTracker: Adding > task 'attempt_200909300419_0001_m_000006_0' to tip > task_200909300419_0001_m_000006, for tracker > 'tracker_domU-12-31-39-02-66-21.compute-1.internal:localhost.localdomain/ > 127.0.0.1:58857' > 2009-09-30 04:31:28,009 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_200909300419_0001_m_000006_0' has completed > task_200909300419_0001_m_000006 successfully. > 2009-09-30 04:31:28,012 INFO org.apache.hadoop.mapred.JobTracker: Adding > task 'attempt_200909300419_0001_m_000001_0' to tip > task_200909300419_0001_m_000001, for tracker > 'tracker_domU-12-31-39-02-66-21.compute-1.internal:localhost.localdomain/ > 127.0.0.1:58857' > 2009-09-30 04:31:28,015 INFO org.apache.hadoop.mapred.JobInProgress: > Choosing data-local task task_200909300419_0001_m_000001 > 2009-09-30 04:31:28,505 INFO org.apache.hadoop.mapred.JobTracker: Adding > task 'attempt_200909300419_0001_m_000002_0' to tip > task_200909300419_0001_m_000002, for tracker > 'tracker_domU-12-31-39-02-F5-D8.compute-1.internal:localhost.localdomain/ > 127.0.0.1:60031' > 2009-09-30 04:31:28,505 INFO org.apache.hadoop.mapred.JobInProgress: > Choosing data-local task task_200909300419_0001_m_000002 > 2009-09-30 04:31:29,577 INFO org.apache.hadoop.mapred.JobTracker: Adding > task 'attempt_200909300419_0001_m_000000_0' to tip > task_200909300419_0001_m_000000, for tracker > 'tracker_domU-12-31-39-03-BA-14.compute-1.internal:localhost.localdomain/ > 127.0.0.1:54570' > 2009-09-30 04:31:29,577 INFO org.apache.hadoop.mapred.JobInProgress: > Choosing data-local task task_200909300419_0001_m_000000 > 2009-09-30 04:31:30,851 INFO org.apache.hadoop.mapred.JobTracker: Adding > task 'attempt_200909300419_0001_m_000003_0' to tip > task_200909300419_0001_m_000003, for tracker > 'tracker_domU-12-31-39-03-41-B6.compute-1.internal:localhost.localdomain/ > 127.0.0.1:42359' > 2009-09-30 04:31:30,851 INFO org.apache.hadoop.mapred.JobInProgress: > Choosing data-local task task_200909300419_0001_m_000003 > 2009-09-30 04:31:31,965 INFO org.apache.hadoop.mapred.JobTracker: Adding > task 'attempt_200909300419_0001_m_000004_0' to tip > task_200909300419_0001_m_000004, for tracker > 'tracker_domU-12-31-39-02-25-A1.compute-1.internal:localhost.localdomain/ > 127.0.0.1:56865' > 2009-09-30 04:31:31,965 INFO org.apache.hadoop.mapred.JobInProgress: > Choosing data-local task task_200909300419_0001_m_000004 > 2009-09-30 04:31:33,095 INFO org.apache.hadoop.mapred.JobTracker: Adding > task 'attempt_200909300419_0001_r_000000_0' to tip > task_200909300419_0001_r_000000, for tracker > 'tracker_domU-12-31-39-02-66-21.compute-1.internal:localhost.localdomain/ > 127.0.0.1:58857' > 2009-09-30 04:36:33,940 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_200909300419_0001_m_000000_0' has completed > task_200909300419_0001_m_000000 successfully. > 2009-09-30 04:36:33,940 INFO org.apache.hadoop.mapred.ResourceEstimator: > measured blowup on task_200909300419_0001_m_000000 was 803/200 = 4.015 > 2009-09-30 04:36:33,941 INFO org.apache.hadoop.mapred.ResourceEstimator: > new > estimate is blowup = 4.015 > 2009-09-30 04:36:33,945 INFO org.apache.hadoop.mapred.JobTracker: Adding > task 'attempt_200909300419_0001_m_000002_1' to tip > task_200909300419_0001_m_000002, for tracker > 'tracker_domU-12-31-39-03-BA-14.compute-1.internal:localhost.localdomain/ > 127.0.0.1:54570' > 2009-09-30 04:36:33,945 INFO org.apache.hadoop.mapred.JobInProgress: > Choosing data-local task task_200909300419_0001_m_000002 > 2009-09-30 04:36:34,793 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_200909300419_0001_m_000002_0' has completed > task_200909300419_0001_m_000002 successfully. > 2009-09-30 04:36:34,793 INFO org.apache.hadoop.mapred.ResourceEstimator: > measured blowup on task_200909300419_0001_m_000002 was 317/190 = > 1.668421052631579 > 2009-09-30 04:36:34,794 INFO org.apache.hadoop.mapred.ResourceEstimator: > new > estimate is blowup = 2.841710526315789 > 2009-09-30 04:36:34,796 INFO org.apache.hadoop.mapred.JobTracker: Adding > task 'attempt_200909300419_0001_m_000003_1' to tip > task_200909300419_0001_m_000003, for tracker > 'tracker_domU-12-31-39-02-F5-D8.compute-1.internal:localhost.localdomain/ > 127.0.0.1:60031' > 2009-09-30 04:36:34,796 INFO org.apache.hadoop.mapred.JobInProgress: > Choosing data-local task task_200909300419_0001_m_000003 > 2009-09-30 04:36:35,170 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_200909300419_0001_m_000003_0' has completed > task_200909300419_0001_m_000003 successfully. > 2009-09-30 04:36:35,170 INFO org.apache.hadoop.mapred.ResourceEstimator: > measured blowup on task_200909300419_0001_m_000003 was 73/189 = > 0.3862433862433862 > 2009-09-30 04:36:35,170 INFO org.apache.hadoop.mapred.ResourceEstimator: > new > estimate is blowup = 2.023221479624988 > 2009-09-30 04:36:35,171 INFO org.apache.hadoop.mapred.JobTracker: Adding > task 'attempt_200909300419_0001_m_000001_1' to tip > task_200909300419_0001_m_000001, for tracker > 'tracker_domU-12-31-39-03-41-B6.compute-1.internal:localhost.localdomain/ > 127.0.0.1:42359' > 2009-09-30 04:36:35,171 INFO org.apache.hadoop.mapred.JobInProgress: > Choosing data-local task task_200909300419_0001_m_000001 > 2009-09-30 04:36:35,854 INFO org.apache.hadoop.mapred.JobTracker: > attempt_200909300419_0001_m_000002_1 is 1909 ms debug. > 2009-09-30 04:36:36,038 INFO org.apache.hadoop.mapred.JobInProgress: Task > 'attempt_200909300419_0001_m_000004_0' has completed > task_200909300419_0001_m_000004 successfully. > 2009-09-30 04:36:36,039 INFO org.apache.hadoop.mapred.ResourceEstimator: > measured blowup on task_200909300419_0001_m_000004 was 918/187 = > 4.909090909090909 > 2009-09-30 04:36:36,039 INFO org.apache.hadoop.mapred.ResourceEstimator: > new > estimate is blowup = 2.744688836991468 > 2009-09-30 04:36:43,966 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_200909300419_0001_m_000002_1' from > 'tracker_domU-12-31-39-03-BA-14.compute-1.internal:localhost.localdomain/ > 127.0.0.1:54570' > 2009-09-30 04:36:44,853 INFO org.apache.hadoop.mapred.JobTracker: Removed > completed task 'attempt_200909300419_0001_m_000003_1' from > 'tracker_domU-12-31-39-02-F5-D8.compute-1.internal:localhost.localdomain/ > 127.0.0.1:60031' > ---------------------------------------- > [End of JobTracker Logs] > ---------------------------------------- > > > Amogh Vasekar-2 wrote: > > > > Hi, > > Can you provide info on the input like compression etc? Also, are you > > using cached files in your map tasks? It might be helpful if you paste > the > > logs here after blanking your system specific info., as then one can find > > out where till the reduce it went or if the copy phase started at all. > > > > Thanks, > > Amogh > > > > -----Original Message----- > > From: achilles852 [mailto:faheemk...@gmail.com] > > Sent: Wednesday, September 30, 2009 6:38 AM > > To: core-...@hadoop.apache.org > > Subject: Re: last map task taking too long > > > > > > Basically, it finishes what it is supposed to do (I view the logs to find > > out), but does not move onto the reduce stage. > > > > > > Ted Dunning wrote: > >> > >> Is that last map task actually running, or is it pending? > >> > >> On Tue, Sep 29, 2009 at 5:57 PM, achilles852 <faheemk...@gmail.com> > >> wrote: > >> > >>> > >>> Hey.. I am trying to write a small mapreduce program. I launch a few > map > >>> tasks, each of which should complete within a certain time (say 5 > >>> minutes)... all the tasks complete within 5 minutes except the last one > >>> - > >>> which takes around 10 times more the time taken by all other map > >>> tasks.....any idea why this is happening? > >>> > >>> I am using Hadoop version 0.19.2, tried running it locally as well as > on > >>> EC2. > >>> -- > >>> View this message in context: > >>> > http://www.nabble.com/last-map-task-taking-too-long-tp25673359p25673359.html > >>> Sent from the Hadoop core-dev mailing list archive at Nabble.com. > >>> > >>> > >> > >> > >> -- > >> Ted Dunning, CTO > >> DeepDyve > >> > >> > > > > -- > > View this message in context: > > > http://www.nabble.com/last-map-task-taking-too-long-tp25673359p25673431.html > > Sent from the Hadoop core-dev mailing list archive at Nabble.com. > > > > > > > > -- > View this message in context: > http://www.nabble.com/last-map-task-taking-too-long-tp25673359p25677160.html > Sent from the Hadoop core-dev mailing list archive at Nabble.com. > >