[ https://issues.apache.org/jira/browse/HIVE-18429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323020#comment-16323020 ]
Prasanth Jayachandran commented on HIVE-18429: ---------------------------------------------- I don't the understand the part where we create .empty file? what is the purpose of it? Why no create directory for TMP_LOCATION using direct FS api when it does not exist? > Compaction should handle a case when it produces no output > ---------------------------------------------------------- > > Key: HIVE-18429 > URL: https://issues.apache.org/jira/browse/HIVE-18429 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 1.0.0 > Reporter: Eugene Koifman > Assignee: Eugene Koifman > Attachments: HIVE-18429.01.patch, HIVE-18429.02.patch > > > Suppose we start with empty delta_8_8 and delta_9_9 and compaction runs. > It will currently produce an MR job with 0 splits and so > {{CompactorMR.TMP_LOCATION}} never gets created. This causes > {{CompactorOutputCommitted.commitJob()}} to fail when it tries to do > {{FileStatus[] contents = fs.listStatus(tmpLocation);}} since tmpLocation > doesn't exist. > If compactor fails to produce delta_8_9 here it will fail to do further > compaction unless new delta with data is created. > If the number of empty deltas is > than > HiveConf.ConfVars.COMPACTOR_MAX_NUM_DELTA, compaction will not be able to > proceed at all. > It should produce a delta_8_9 in this case even if it's empty. > The error (in the log of standalone metastore process) would look like this > {noformat} > 2017-12-27 17:19:28,850 ERROR CommitterEvent Processor #1 > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Could not > commit job > java.io.FileNotFoundException: File > hdfs://OTCHaaS/apps/hive/warehouse/momi.db/sensor_data/babyid=5911806ebf69640100004257/_tmp_b4c5a3f3-44e5-4d45-86af-5b773bf0fc96 > does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:923) > at > org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:985) > at > org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:981) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:992) > at > rg.apache.hadoop.hive.ql.txn.compactor.CompactorMR$CompactorOutputCommitter.commitJob(CompactorMR.java:785) > at > org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:291) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:285) > at > org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.run(CommitterEventHandler.java:237) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)