[ 
https://issues.apache.org/jira/browse/HIVE-26179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengchenyu updated HIVE-26179:
-------------------------------
    Description: 
In our cluster, we found error like this.
{code:java}
Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
failure ) : 
attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
java.lang.RuntimeException: Hive Runtime Error while closing operators
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
    at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
operators
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
    ... 16 more
Caused by: java.lang.NullPointerException
    at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
    ... 17 more
{code}
When tez reuse container is enable, and use MapJoinOperator, if same tasks's 
different taskattemp execute in same container, will throw NPE.

By my debug, I found the second task attempt use first task's 
asyncInitOperations. asyncInitOperations are not clear when close op, then 
second taskattemp may use first taskattepmt's mapJoinTables which 
HybridHashTableContainer.HashPartition is closed, so throw NPE.

We must clear asyncInitOperations when op is closed.

  was:
In our cluster, we found error like this.
{code}
Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
failure ) : 
attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
java.lang.RuntimeException: Hive Runtime Error while closing operators
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
    at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
    at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
    at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
    at 
com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
    at 
com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
    at 
com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
operators
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
    at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
    ... 16 more
Caused by: java.lang.NullPointerException
    at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
    at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
    ... 17 more
{code}

When tez reuse container is enable, and use MapJoinOperator, if same tasks's 
different taskattemp execute in same container, will throw NPE.


By my debug, I found the second task attempt use first task's 
asyncInitOperations. asyncInitOperations are not clear when close op, then 
second taskattemp may use first taskattepmt's mapJoinTables which 
HybridHashTableContainer.HashPartition is closed, so throw NPE.

 

We must clear asyncInitOperations when op is closed.


> In tez reuse container mode, asyncInitOperations are not clear.
> ---------------------------------------------------------------
>
>                 Key: HIVE-26179
>                 URL: https://issues.apache.org/jira/browse/HIVE-26179
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Tez
>    Affects Versions: 1.2.1
>         Environment: engine: Tez (Note: tez.am.container.reuse.enabled is 
> true)
>  
>            Reporter: zhengchenyu
>            Assignee: zhengchenyu
>            Priority: Major
>             Fix For: 4.0.0
>
>
> In our cluster, we found error like this.
> {code:java}
> Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, 
> diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: Hive Runtime Error while closing operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
>     at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>     at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>     at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
>     at 
> com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
>     at 
> com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Hive Runtime Error while closing 
> operators
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
>     ... 16 more
> Caused by: java.lang.NullPointerException
>     at 
> org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
>     at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
>     at 
> org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
>     ... 17 more
> {code}
> When tez reuse container is enable, and use MapJoinOperator, if same tasks's 
> different taskattemp execute in same container, will throw NPE.
> By my debug, I found the second task attempt use first task's 
> asyncInitOperations. asyncInitOperations are not clear when close op, then 
> second taskattemp may use first taskattepmt's mapJoinTables which 
> HybridHashTableContainer.HashPartition is closed, so throw NPE.
> We must clear asyncInitOperations when op is closed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to