Re: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

Adarsh Sharma Mon, 20 Dec 2010 20:16:59 -0800

Sean Curtis wrote:

in failed/killed task attempts, i see the following:
attempt_201012141048_0023_m_000000_0task_201012141048_0023_m_000000<http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_m_000000>172.24.10.91<http://172.24.10.91:50060>FAILED
Too many fetch-failures
Last 4KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000000_0&start=-4097>Last 8KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000000_0&start=-8193>All<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000000_0>attempt_201012141048_0023_m_000000_1task_201012141048_0023_m_000000<http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_m_000000>172.24.10.91<http://172.24.10.91:50060>FAILED
Too many fetch-failures
Last 4KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000000_1&start=-4097>Last 8KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000000_1&start=-8193>All<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000000_1>attempt_201012141048_0023_m_000001_0task_201012141048_0023_m_000001<http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_m_000001>172.24.10.91<http://172.24.10.91:50060>FAILED
Too many fetch-failures
Last 4KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000001_0&start=-4097>Last 8KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000001_0&start=-8193>All<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000001_0>attempt_201012141048_0023_m_000001_1task_201012141048_0023_m_000001<http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_m_000001>172.24.10.91<http://172.24.10.91:50060>FAILED
Too many fetch-failures
Last 4KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000001_1&start=-4097>Last 8KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000001_1&start=-8193>All<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_m_000001_1>attempt_201012141048_0023_r_000000_0task_201012141048_0023_r_000000<http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_r_000000>172.24.10.91<http://172.24.10.91:50060>FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.

The value you have in your hadoop-site.xml file for hadoop.tmp.dir willget you into trouble:

/tmp/hadoop/tmp/dir/hadoop-${user.name}

as many systems remove items from /tmp that are older than some timeinterval.


Are there any apparently relevant messages in the task tracker logs?

With two nodes and a small number of reducers, thetasktracker.http.threads change is unlikely to be part of the issue.

In general, the shuffle phase is simply transferring the sorted mapoutputs, to the reducer and merge sorting the results.


The errors tend to fall into two types.
Failed or blocked transfers.

Merge sort failures.

The failed or blocked transfers tend to be due to, to many requests atone time to a task tracker, and is controlled bytasktracker.http.threads which increases the number of requests that maybe serviced.


or firewall issues that block the actual transfer.

The merge sort failures tend to be either: out of memory or out of diskspace issues.


There are a couple of jira's open for shuffle errors for other cases.

http://issues.apache.org/jira/browse/HADOOP-3604

http://issues.apache.org/jira/browse/HADOOP-3155 < - likely cause fixedin the cloudera 0.18.3 distribution

http://issues.apache.org/jira/browse/HADOOP-4115
http://issues.apache.org/jira/browse/HADOOP-3130
http://issues.apache.org/jira/browse/HADOOP-2095

Last 4KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_0&start=-4097>Last 8KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_0&start=-8193>All<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_0>attempt_201012141048_0023_r_000000_1task_201012141048_0023_r_000000<http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_r_000000>172.24.10.91<http://172.24.10.91:50060>FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Last 4KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_1&start=-4097>Last 8KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_1&start=-8193>All<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_1>attempt_201012141048_0023_r_000000_2task_201012141048_0023_r_000000<http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_r_000000>172.24.10.91<http://172.24.10.91:50060>FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
Last 4KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_2&start=-4097>Last 8KB<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_2&start=-8193>All<http://172.24.10.91:50060/tasklog?attemptid=attempt_201012141048_0023_r_000000_2>attempt_201012141048_0023_r_000000_3task_201012141048_0023_r_000000<http://localhost:50030/taskdetails.jsp?tipid=task_201012141048_0023_r_000000>172.24.10.91<http://172.24.10.91:50060>FAILED
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.



On Dec 20, 2010, at 11:01 PM, Adarsh Sharma wrote:
Sean Curtis wrote:
just running a simple select count(1) from a table (using movielensas an example) doesnt seem to work for me. anyone know why thisdoesnt work? im using hive trunk:
hive> select avg(rating) from movierating where movieid=43;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
 set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
 set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
 set mapred.reduce.tasks=<number>
Starting Job = job_201012141048_0023, Tracking URL =http://localhost:50030/jobdetails.jsp?jobid=job_201012141048_0023Kill Command = /Users/Sean/dev/hadoop-0.20.2+737/bin/../bin/hadoopjob -Dmapred.job.tracker=localhost:8021 -kill job_201012141048_0023
2010-12-20 15:15:03,295 Stage-1 map = 0%,  reduce = 0%
2010-12-20 15:15:09,420 Stage-1 map = 50%,  reduce = 0%
... eventually fails after a couple of minutes with:

2010-12-20 17:33:01,113 Stage-1 map = 100%,  reduce = 0%
2010-12-20 17:33:32,182 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201012141048_0023 with errors
FAILED: Execution Error, return code 2 fromorg.apache.hadoop.hive.ql.exec.MapRedTask
hive>
almost seems like the reduce task never starts. any help would beappreciated.
sean
To know the root cause of the problem, got to Jobtracker web UI (IP:50030) and Check Job Tracker History at the bottom correspondingto this Job ID.
Best Regards

Adarsh Sharma

Re: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

Reply via email to