[ https://issues.apache.org/jira/browse/HIVE-23451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105941#comment-17105941 ]
Rajesh Balamohan commented on HIVE-23451: ----------------------------------------- I had the same confusion on this codepath on the treatment for index "0" and .1 patch followed the approach of reducing duplicate invocation. Looking at the codepath more reveals that {{totalFiles}} is set to 1 by default in semanticAnalyzer. In case of bucketing, it ends up setting to number of buckets. [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6942] (set to number of buckets) [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java#L6787] (sets to 1 by default). "0" index handling isn't needed actually. I will attach .2 version shortly. > FileSinkOperator calls deleteOnExit (hdfs call) twice for the same file > ----------------------------------------------------------------------- > > Key: HIVE-23451 > URL: https://issues.apache.org/jira/browse/HIVE-23451 > Project: Hive > Issue Type: Improvement > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > Priority: Minor > Attachments: HIVE-23451.1.patch, HIVE-23451.2.patch > > > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L826] > [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L797] > Can avoid a NN call here (i.e, mainly for small queries). -- This message was sent by Atlassian Jira (v8.3.4#803005)