[ https://issues.apache.org/jira/browse/HIVE-22373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16958011#comment-16958011 ]
Ashutosh Chauhan commented on HIVE-22373: ----------------------------------------- +1 > File Merge tasks fail when containers are reused > ------------------------------------------------ > > Key: HIVE-22373 > URL: https://issues.apache.org/jira/browse/HIVE-22373 > Project: Hive > Issue Type: Bug > Affects Versions: 3.1.2 > Reporter: Toshihiko Uchida > Assignee: Toshihiko Uchida > Priority: Major > Attachments: HIVE-22373.patch > > > h1. Problems > Setting tez.am.container.reuse.enabled=true allows for containers to be > reused across multiple tasks. > When two File Merge tasks run on the same container, the last task fails in > renaming the output path. > Below is an error log of the task 000001_0 on the container > container_e87_1570604853053_11564_01_000003, where the task 000004_0 ran > before the task 000001_0. > It shows that the task 000001_0's output file name is taken from the previous > task id 000004_0 mistakenly. > {code} > 2019-10-15 13:00:31,438 [ERROR] [TezChild] |tez.TezProcessor|: > java.lang.RuntimeException: Hive Runtime Error while closing operators > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:188) > at > org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:284) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42) > at > org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) > at > org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) > at > org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at > com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) > at > com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) > at > com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to close > AbstractFileMergeOperator > at > org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:315) > at > org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.closeOp(OrcFileMergeOperator.java:265) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:733) > at > org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.close(MergeFileRecordProcessor.java:180) > ... 17 more > Caused by: java.io.IOException: Unable to rename > viewfs://<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_task_tmp.-ext-10000/_tmp.000004_0 > to > viewfs://<cluster_name>/user/<user_name>/.hive-staging_hive_2019-10-15_12-59-32_916_2461818728035733124-15476/_tmp.-ext-10000/000004_0 > at > org.apache.hadoop.hive.ql.exec.AbstractFileMergeOperator.closeOp(AbstractFileMergeOperator.java:254) > ... 20 more > {code} > h1. Causes > When AbstractFileMergeOperator is initialized, taskId is updated only for the > first time. > - AbstractFileMergeOperator.java > {code} > private void updatePaths(Path tp, Path ttp) { > if (taskId == null) { > taskId = Utilities.getTaskId(jc); > } > {code} > It leads to the above conflict of the output file names. > h1. Solutions > Remove the null-checking conditional, which was introduced in HIVE-14640, and > update taskId from JobConf whenever the operator is initialized. -- This message was sent by Atlassian Jira (v8.3.4#803005)