[jira] [Commented] (HIVE-20587) Reduce phase could not be completed on MapReduce engine

Abhishek (Jira) Tue, 19 Nov 2019 03:40:11 -0800


    [ 
https://issues.apache.org/jira/browse/HIVE-20587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16977412#comment-16977412
 ]


Abhishek commented on HIVE-20587:
---------------------------------

it can be tuned through by editing vi 
/usr/local/hadoop-3.1.3/etc/hadoop/mapred-site.xml 

 

<property>
 <name>mapreduce.map.memory.mb</name>
 <value>4096</value>
 </property>
 <property>
 <name>mapreduce.reduce.memory.mb</name>
 <value>4096</value>
 </property>

rest could be fine

> Reduce phase could not be completed on MapReduce engine
> -------------------------------------------------------
>
>                 Key: HIVE-20587
>                 URL: https://issues.apache.org/jira/browse/HIVE-20587
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Mykhailo Kysliuk
>            Priority: Minor
>
> Steps to reproduce:
> {code}
> CREATE TABLE IF NOT EXISTS `test`(
>   `account_id` bigint, 
>   `subscription_id` bigint, 
>   `previous_subscription_id` bigint, 
>   `subscription_effective_from_dt` date, 
>   `subscription_effective_to_dt` date, 
>   `order_completed_dtm` timestamp, 
>   `subscription_status` string) STORED AS ORC;
> insert into test values (1, 1, 1, NULL, NULL, NULL, 'ss');
> SELECT
>   account_id,
>   subscription_id,
>   previous_subscription_id,
>   subscription_status,
>   subscription_effective_from_dt,
>   subscription_effective_to_dt 
> FROM 
> ( 
>   SELECT 
>     DISTINCT account_id, 
>     subscription_id, 
>     previous_subscription_id, 
>     subscription_status, 
>     subscription_effective_from_dt, 
>     subscription_effective_to_dt, 
>     order_completed_dtm, 
>     ROW_NUMBER()OVER(PARTITION BY ACCOUNT_ID) as r1
>   FROM test
>   WHERE account_id=1 
> ) AS RTE where r1=1;
> {code}
> Expected result:
> {code}
> OK
> 1     1       1       ss      NULL    NULL
> {code}
> Actual result:
> {code}
> Starting Job = job_1537276416358_0006, Tracking URL = 
> https://node10.cluster.com:8090/proxy/application_1537276416358_0006/
> Kill Command = /opt/mapr/hadoop/hadoop-2.7.0/bin/hadoop job  -kill 
> job_1537276416358_0006
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
> 1
> 2018-09-18 13:21:00,754 Stage-1 map = 0%,  reduce = 0%
> 2018-09-18 13:21:04,913 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:22:05,152 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:23:05,540 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:24:06,532 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:25:07,477 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:26:07,914 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:27:08,021 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:28:08,935 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:29:08,970 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:30:09,054 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:31:09,965 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:32:10,922 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:33:10,964 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:34:11,005 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:35:11,978 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:36:13,057 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:37:13,208 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:38:13,394 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:39:13,719 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:40:13,729 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:41:14,103 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:42:14,160 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:43:14,241 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:44:14,345 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:45:15,353 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:46:15,546 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:47:15,608 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:48:16,481 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:49:16,522 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:50:16,774 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:51:17,784 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:52:17,838 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:53:17,868 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:54:18,010 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:55:18,287 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:56:18,310 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:57:18,628 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:58:19,061 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 13:59:19,359 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 14:00:19,534 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 14:01:19,707 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 14:02:20,670 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.3 
> sec
> 2018-09-18 14:03:01,193 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 1.3 sec
> MapReduce Total cumulative CPU time: 1 seconds 300 msec
> Ended Job = job_1537276416358_0006 with errors
> Error during job, obtaining debugging information...
> Examining task ID: task_1537276416358_0006_m_000000 (and more) from job 
> job_1537276416358_0006
> Task with the most failures(4): 
> -----
> Task ID:
>   task_1537276416358_0006_r_000000
> -----
> Diagnostic Messages for this Task:
> AttemptID:attempt_1537276416358_0006_r_000000_3 Timed out after 600 secs
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched: 
> Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 1.3 sec   HDFS Read: 0 
> HDFS Write: 0 FAIL
> Total MapReduce CPU Time Spent: 1 seconds 300 msec
> {code}
> On tez engine this query runs fine.
> At yarn logs:
> {code}
> 2018-09-18 14:53:18,012 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated 
> containers 1
> 2018-09-18 14:53:18,012 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned to reduce
> 2018-09-18 14:53:18,013 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Assigned 
> container container_e1322_1537276416358_0014_01_000003 to 
> attempt_1537276416358_0014_r_000000_0
> 2018-09-18 14:53:18,013 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:0 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 AssignedReds:1 
> CompletedMaps:1 CompletedReds:0 ContAlloc:2 ContRel:0 HostLocal:1 RackLocal:0
> 2018-09-18 14:53:18,020 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapred.Task: mapOutputFile class: 
> org.apache.hadoop.mapred.MapRFsOutputFile
> 2018-09-18 14:53:18,021 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.yarn.util.RackResolver: Resolved node10.cluster.com to 
> /default-rack
> 2018-09-18 14:53:18,021 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1537276416358_0014_r_000000_0 TaskAttempt Transitioned from 
> UNASSIGNED to ASSIGNED
> 2018-09-18 14:53:18,021 INFO [ContainerLauncher #2] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing 
> the event EventType: CONTAINER_REMOTE_LAUNCH for container 
> container_e1322_1537276416358_0014_01_000003 taskAttempt 
> attempt_1537276416358_0014_r_000000_0
> 2018-09-18 14:53:18,021 INFO [ContainerLauncher #2] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Launching 
> attempt_1537276416358_0014_r_000000_0
> 2018-09-18 14:53:18,021 INFO [ContainerLauncher #2] 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: 
> Opening proxy : node10.cluster.com:8099
> 2018-09-18 14:53:18,030 INFO [ContainerLauncher #2] 
> org.apache.hadoop.mapreduce.v2.app.job.event.TaskAttemptContainerLaunchedEvent:
>  Shuffle port returned by ContainerManager for 
> attempt_1537276416358_0014_r_000000_0 : 13562
> 2018-09-18 14:53:18,030 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: TaskAttempt: 
> [attempt_1537276416358_0014_r_000000_0] using containerId: 
> [container_e1322_1537276416358_0014_01_000003 on NM: [node10.cluster.com:8099]
> 2018-09-18 14:53:18,030 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1537276416358_0014_r_000000_0 TaskAttempt Transitioned from ASSIGNED 
> to RUNNING
> 2018-09-18 14:53:18,030 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator: ATTEMPT_START 
> task_1537276416358_0014_r_000000
> 2018-09-18 14:53:18,031 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: 
> task_1537276416358_0014_r_000000 Task Transitioned from SCHEDULED to RUNNING
> 2018-09-18 14:53:19,018 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1537276416358_0014: ask=1 release= 0 newContainers=0 
> finishedContainers=0 resourcelimit=<memory:8784, vCores:1, disks:0.0> 
> knownNMs=1
> 2018-09-18 15:03:41,844 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1537276416358_0014_r_000000_0: 
> AttemptID:attempt_1537276416358_0014_r_000000_0 Timed out after 600 secs
> 2018-09-18 15:03:41,845 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1537276416358_0014_r_000000_0 TaskAttempt Transitioned from RUNNING 
> to FAIL_CONTAINER_CLEANUP
> 2018-09-18 15:03:41,846 INFO [ContainerLauncher #3] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing 
> the event EventType: CONTAINER_REMOTE_CLEANUP for container 
> container_e1322_1537276416358_0014_01_000003 taskAttempt 
> attempt_1537276416358_0014_r_000000_0
> 2018-09-18 15:03:41,847 INFO [ContainerLauncher #3] 
> org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING 
> attempt_1537276416358_0014_r_000000_0
> 2018-09-18 15:03:41,848 INFO [ContainerLauncher #3] 
> org.apache.hadoop.yarn.client.api.impl.ContainerManagementProtocolProxy: 
> Opening proxy : node10.cluster.com:8099
> 2018-09-18 15:03:41,881 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1537276416358_0014_r_000000_0 TaskAttempt Transitioned from 
> FAIL_CONTAINER_CLEANUP to FAIL_TASK_CLEANUP
> 2018-09-18 15:03:41,882 INFO [CommitterEvent Processor #1] 
> org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing 
> the event EventType: TASK_ABORT
> 2018-09-18 15:03:41,885 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1537276416358_0014_r_000000_0 TaskAttempt Transitioned from 
> FAIL_TASK_CLEANUP to FAILED
> 2018-09-18 15:03:41,888 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: 
> attempt_1537276416358_0014_r_000000_1 TaskAttempt Transitioned from NEW to 
> UNASSIGNED
> 2018-09-18 15:03:41,888 INFO [Thread-49] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures on 
> node node10.cluster.com
> 2018-09-18 15:03:42,510 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before 
> Scheduling: PendingReds:1 ScheduledMaps:0 ScheduledReds:0 AssignedMaps:0 
> AssignedReds:1 CompletedMaps:1 CompletedReds:0 ContAlloc:2 ContRel:0 
> HostLocal:1 RackLocal:0
> 2018-09-18 15:03:42,513 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating 
> schedule, headroom=<memory:8784, vCores:1, disks:0.0>
> 2018-09-18 15:03:42,513 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: All maps 
> assigned. Ramping up all remaining reduces:1
> 2018-09-18 15:03:42,513 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:0 ScheduledMaps:0 ScheduledReds:1 AssignedMaps:0 AssignedReds:1 
> CompletedMaps:1 CompletedReds:0 ContAlloc:2 ContRel:0 HostLocal:1 RackLocal:0
> 2018-09-18 15:03:43,515 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() 
> for application_1537276416358_0014: ask=1 release= 0 newContainers=0 
> finishedContainers=1 resourcelimit=<memory:12880, vCores:2, disks:1.33> 
> knownNMs=1
> 2018-09-18 15:03:43,515 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received 
> completed container container_e1322_1537276416358_0014_01_000003
> 2018-09-18 15:03:43,515 INFO [RMCommunicator Allocator] 
> org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: After Scheduling: 
> PendingReds:0 ScheduledMaps:0 ScheduledReds:1 AssignedMaps:0 AssignedReds:0 
> CompletedMaps:1 CompletedReds:0 ContAlloc:2 ContRel:0 HostLocal:1 RackLocal:0
> 2018-09-18 15:03:43,515 INFO [AsyncDispatcher event handler] 
> org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics 
> report from attempt_1537276416358_0014_r_000000_0: Container killed by the 
> ApplicationMaster.
> Container killed on request. Exit code is 143
> Container exited with a non-zero exit code 143. 
> {code}
> Maybe this is not a bug, but MR has not enough resources to run this query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-20587) Reduce phase could not be completed on MapReduce engine

Reply via email to