[ 
https://issues.apache.org/jira/browse/HIVE-28962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17954394#comment-17954394
 ] 

Denys Kuzmenko commented on HIVE-28962:
---------------------------------------

Merged to master
Thanks [~simhadri-g] and [~abstractdog] for the review!

> Prevent committing outputs if an exception was thrown in the Tez processor
> --------------------------------------------------------------------------
>
>                 Key: HIVE-28962
>                 URL: https://issues.apache.org/jira/browse/HIVE-28962
>             Project: Hive
>          Issue Type: Bug
>          Components: Iceberg integration
>            Reporter: Denys Kuzmenko
>            Assignee: Denys Kuzmenko
>            Priority: Major
>              Labels: pull-request-available
>
> Communication failure between coordinator and executor led to a task kill 
> (06:04:18.328Z)
> {code:java}
> app=query-coordinator-0/2025-05-16-06-00_query-coordinator_query-coordinator-0-7_638e93f4-c948-4fc0-ad05-1451fc4b7c53_0.log:<14>1
>  2025-05-16T06:04:18.328Z query-coordinator-0-7 query-coordinator 1 
> 638e93f4-c948-4fc0-ad05-1451fc4b7c53 [mdc@38374 
> class="HistoryEventHandler.criticalEvents" level="INFO" thread="Dispatcher 
> thread {Central}"] 
> [HISTORY][DAG:dag_1746256596566_0014_1634][Event:TASK_ATTEMPT_FINISHED]: 
> vertexName=Reducer 3, 
> taskAttemptId=attempt_1746256596566_0014_1634_07_000070_13, 
> creationTime=1747375458134, allocationTime=1747375458277, 
> startTime=1747375458294, finishTime=1747375458328, timeTaken=34, 
> status=KILLED, errorEnum=NODE_FAILED, diagnostics=Node with same host and 
> port but with new unique ID pinged, nodeHttpAddress=<>, counters=Counters: 1, 
> org.apache.tez.common.counters.TaskCounter, TASK_DURATION_MILLIS=34
> {code:java}
>  {code}
> then, an InterruptedException was thrown by `ReduceRecordProcessor.init()` 
> because of task attempt kill (06:04:27.442Z)
> {code:java}
> <11>1 2025-05-16T06:04:27.442Z query-executor-0-16 query-executor 1 
> 71c82503-aeaa-4ca8-ad1c-c01db6df4f9d [mdc@38374 class="tez.TezProcessor" 
> dagId="dag_1746256596566_0014_1634" 
> fragmentId="1746256596566_0014_1634_07_000070_13" level="ERROR" 
> queryId="hive_20250516055906_b7f7cff1-fc84-44d2-a7c5-30bd9aa428c5" 
> thread="TezTR-596566_14_1634_7_70_13"] java.lang.InterruptedException\r at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2056)\r
>  at 
> java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2090)\r
>  at 
> org.apache.tez.runtime.InputReadyTracker$InputReadyMonitor.awaitCondition(InputReadyTracker.java:147)\r
>  at 
> org.apache.tez.runtime.InputReadyTracker.waitForAllInputsReady(InputReadyTracker.java:107)\r
>  at 
> org.apache.tez.runtime.api.impl.TezProcessorContextImpl.waitForAllInputsReady(TezProcessorContextImpl.java:138)\r
>  at 
> org.apache.tez.runtime.api.impl.TezProcessorContextImpl.waitForAllInputsReady(TezProcessorContextImpl.java:133)\r
>  at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:122)\r
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296)\r
>  at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:280)\r 
> at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)\r
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:86)\r
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:72)\r
>  at java.base/java.security.AccessController.doPrivileged(Native Method)\r at 
> java.base/javax.security.auth.Subject.doAs(Subject.java:423)\r at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)\r
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:72)\r
>  at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:42)\r
>  at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)\r at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:118)\r
>  at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\r at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\r
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\r
>  at java.base/java.lang.Thread.run(Thread.java:829)\r
> {code}
> however, it still allowed task to commit and created a commit manifest 
> (06:04:27.442Z)
> {code:java}
>  
> <14>1 2025-05-16T06:04:27.442Z query-executor-0-16 query-executor 1 
> 71c82503-aeaa-4ca8-ad1c-c01db6df4f9d [mdc@38374 
> class="hive.HiveIcebergOutputCommitter" dagId="dag_1746256596566_0014_1634" 
> fragmentId="1746256596566_0014_1634_07_000070_13" level="INFO" 
> queryId="hive_20250516055906_b7f7cff1-fc84-44d2-a7c5-30bd9aa428c5" 
> thread="TezTR-596566_14_1634_7_70_13"] Created Iceberg commitTask manifest 
> file: 
> [s3a://temp/hive_20250516055906_b7f7cff1-fc84-44d2-a7c5-30bd9aa428c5-job_17462565965667_0014/task-70.forCommit]\rFilesForCommit\{dataFiles=[],
>  deleteFiles=[], replacedDataFiles=[], referencedDataFiles=[]}
> {code}
> that is a regression introduced by HIVE-24857



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to