did you look for network-related errors in the logs. i've found instances
when a VPN connection confused networking in hadoop. (could be a JVM issue,
didn't spend the time to know for sure)

thanks,
thushara

On Wed, Sep 30, 2009 at 1:46 AM, achilles852 <faheemk...@gmail.com> wrote:

>
> Running Hadoop on EC2 using 32 bit small instances with 5 slaves. The job
> has
> 5 map tasks and 1 reduce task. Each map task is 5 minutes long.
> After 5 minutes and 14 seconds, 4 map tasks completed but the last one is
> now stalling the entire process.
>
> Here are the logs for JobTracker:
> ----------------------------------------
> [start of JobTracker Logs]
> ----------------------------------------
> 2009-09-30 04:31:24,468 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_200909300419_0001_m_000006_0' to tip
> task_200909300419_0001_m_000006, for tracker
> 'tracker_domU-12-31-39-02-66-21.compute-1.internal:localhost.localdomain/
> 127.0.0.1:58857'
> 2009-09-30 04:31:28,009 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_200909300419_0001_m_000006_0' has completed
> task_200909300419_0001_m_000006 successfully.
> 2009-09-30 04:31:28,012 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_200909300419_0001_m_000001_0' to tip
> task_200909300419_0001_m_000001, for tracker
> 'tracker_domU-12-31-39-02-66-21.compute-1.internal:localhost.localdomain/
> 127.0.0.1:58857'
> 2009-09-30 04:31:28,015 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_200909300419_0001_m_000001
> 2009-09-30 04:31:28,505 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_200909300419_0001_m_000002_0' to tip
> task_200909300419_0001_m_000002, for tracker
> 'tracker_domU-12-31-39-02-F5-D8.compute-1.internal:localhost.localdomain/
> 127.0.0.1:60031'
> 2009-09-30 04:31:28,505 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_200909300419_0001_m_000002
> 2009-09-30 04:31:29,577 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_200909300419_0001_m_000000_0' to tip
> task_200909300419_0001_m_000000, for tracker
> 'tracker_domU-12-31-39-03-BA-14.compute-1.internal:localhost.localdomain/
> 127.0.0.1:54570'
> 2009-09-30 04:31:29,577 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_200909300419_0001_m_000000
> 2009-09-30 04:31:30,851 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_200909300419_0001_m_000003_0' to tip
> task_200909300419_0001_m_000003, for tracker
> 'tracker_domU-12-31-39-03-41-B6.compute-1.internal:localhost.localdomain/
> 127.0.0.1:42359'
> 2009-09-30 04:31:30,851 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_200909300419_0001_m_000003
> 2009-09-30 04:31:31,965 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_200909300419_0001_m_000004_0' to tip
> task_200909300419_0001_m_000004, for tracker
> 'tracker_domU-12-31-39-02-25-A1.compute-1.internal:localhost.localdomain/
> 127.0.0.1:56865'
> 2009-09-30 04:31:31,965 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_200909300419_0001_m_000004
> 2009-09-30 04:31:33,095 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_200909300419_0001_r_000000_0' to tip
> task_200909300419_0001_r_000000, for tracker
> 'tracker_domU-12-31-39-02-66-21.compute-1.internal:localhost.localdomain/
> 127.0.0.1:58857'
> 2009-09-30 04:36:33,940 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_200909300419_0001_m_000000_0' has completed
> task_200909300419_0001_m_000000 successfully.
> 2009-09-30 04:36:33,940 INFO org.apache.hadoop.mapred.ResourceEstimator:
> measured blowup on task_200909300419_0001_m_000000 was 803/200 = 4.015
> 2009-09-30 04:36:33,941 INFO org.apache.hadoop.mapred.ResourceEstimator:
> new
> estimate is blowup = 4.015
> 2009-09-30 04:36:33,945 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_200909300419_0001_m_000002_1' to tip
> task_200909300419_0001_m_000002, for tracker
> 'tracker_domU-12-31-39-03-BA-14.compute-1.internal:localhost.localdomain/
> 127.0.0.1:54570'
> 2009-09-30 04:36:33,945 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_200909300419_0001_m_000002
> 2009-09-30 04:36:34,793 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_200909300419_0001_m_000002_0' has completed
> task_200909300419_0001_m_000002 successfully.
> 2009-09-30 04:36:34,793 INFO org.apache.hadoop.mapred.ResourceEstimator:
> measured blowup on task_200909300419_0001_m_000002 was 317/190 =
> 1.668421052631579
> 2009-09-30 04:36:34,794 INFO org.apache.hadoop.mapred.ResourceEstimator:
> new
> estimate is blowup = 2.841710526315789
> 2009-09-30 04:36:34,796 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_200909300419_0001_m_000003_1' to tip
> task_200909300419_0001_m_000003, for tracker
> 'tracker_domU-12-31-39-02-F5-D8.compute-1.internal:localhost.localdomain/
> 127.0.0.1:60031'
> 2009-09-30 04:36:34,796 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_200909300419_0001_m_000003
> 2009-09-30 04:36:35,170 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_200909300419_0001_m_000003_0' has completed
> task_200909300419_0001_m_000003 successfully.
> 2009-09-30 04:36:35,170 INFO org.apache.hadoop.mapred.ResourceEstimator:
> measured blowup on task_200909300419_0001_m_000003 was 73/189 =
> 0.3862433862433862
> 2009-09-30 04:36:35,170 INFO org.apache.hadoop.mapred.ResourceEstimator:
> new
> estimate is blowup = 2.023221479624988
> 2009-09-30 04:36:35,171 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task 'attempt_200909300419_0001_m_000001_1' to tip
> task_200909300419_0001_m_000001, for tracker
> 'tracker_domU-12-31-39-03-41-B6.compute-1.internal:localhost.localdomain/
> 127.0.0.1:42359'
> 2009-09-30 04:36:35,171 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_200909300419_0001_m_000001
> 2009-09-30 04:36:35,854 INFO org.apache.hadoop.mapred.JobTracker:
> attempt_200909300419_0001_m_000002_1 is 1909 ms debug.
> 2009-09-30 04:36:36,038 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_200909300419_0001_m_000004_0' has completed
> task_200909300419_0001_m_000004 successfully.
> 2009-09-30 04:36:36,039 INFO org.apache.hadoop.mapred.ResourceEstimator:
> measured blowup on task_200909300419_0001_m_000004 was 918/187 =
> 4.909090909090909
> 2009-09-30 04:36:36,039 INFO org.apache.hadoop.mapred.ResourceEstimator:
> new
> estimate is blowup = 2.744688836991468
> 2009-09-30 04:36:43,966 INFO org.apache.hadoop.mapred.JobTracker: Removed
> completed task 'attempt_200909300419_0001_m_000002_1' from
> 'tracker_domU-12-31-39-03-BA-14.compute-1.internal:localhost.localdomain/
> 127.0.0.1:54570'
> 2009-09-30 04:36:44,853 INFO org.apache.hadoop.mapred.JobTracker: Removed
> completed task 'attempt_200909300419_0001_m_000003_1' from
> 'tracker_domU-12-31-39-02-F5-D8.compute-1.internal:localhost.localdomain/
> 127.0.0.1:60031'
> ----------------------------------------
> [End of JobTracker Logs]
> ----------------------------------------
>
>
> Amogh Vasekar-2 wrote:
> >
> > Hi,
> > Can you provide info on the input like compression etc? Also, are you
> > using cached files in your map tasks? It might be helpful if you paste
> the
> > logs here after blanking your system specific info., as then one can find
> > out where till the reduce it went or if the copy phase started at all.
> >
> > Thanks,
> > Amogh
> >
> > -----Original Message-----
> > From: achilles852 [mailto:faheemk...@gmail.com]
> > Sent: Wednesday, September 30, 2009 6:38 AM
> > To: core-...@hadoop.apache.org
> > Subject: Re: last map task taking too long
> >
> >
> > Basically, it finishes what it is supposed to do (I view the logs to find
> > out), but does not move onto the reduce stage.
> >
> >
> > Ted Dunning wrote:
> >>
> >> Is that last map task actually running, or is it pending?
> >>
> >> On Tue, Sep 29, 2009 at 5:57 PM, achilles852 <faheemk...@gmail.com>
> >> wrote:
> >>
> >>>
> >>> Hey.. I am trying to write a small mapreduce program. I launch a few
> map
> >>> tasks, each of which should complete within a certain time (say 5
> >>> minutes)... all the tasks complete within 5 minutes except the last one
> >>> -
> >>> which takes around 10 times more the time taken by all other map
> >>> tasks.....any idea why this is happening?
> >>>
> >>> I am using Hadoop version 0.19.2, tried running it locally as well as
> on
> >>> EC2.
> >>> --
> >>> View this message in context:
> >>>
> http://www.nabble.com/last-map-task-taking-too-long-tp25673359p25673359.html
> >>> Sent from the Hadoop core-dev mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >>
> >> --
> >> Ted Dunning, CTO
> >> DeepDyve
> >>
> >>
> >
> > --
> > View this message in context:
> >
> http://www.nabble.com/last-map-task-taking-too-long-tp25673359p25673431.html
> > Sent from the Hadoop core-dev mailing list archive at Nabble.com.
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/last-map-task-taking-too-long-tp25673359p25677160.html
> Sent from the Hadoop core-dev mailing list archive at Nabble.com.
>
>

Reply via email to