[ https://issues.apache.org/jira/browse/HIVE-22941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17046467#comment-17046467 ]
László Bodor commented on HIVE-22941: ------------------------------------- issue reproduced: {code} export QTEST_LEAVE_FILES=true mvn test -Dtest.output.overwrite=true -Pitests,hadoop-2 -Denforcer.skip=true -pl itests/qtest -Dtest=TestMiniLlapLocalCliDriver -Dqfile=empty_files_non_bucketed.q ... lbodor@HW12459 ~/repos/hive HDP-3.1-maint ● ls -la itests/qtest/target/localfs/warehouse/t1/000000_0 -rw-r--r-- 1 lbodor staff 0 Feb 25 11:42 itests/qtest/target/localfs/warehouse/t1/000000_0 {code} https://github.com/abstractdog/hive/commit/7e08a3f654d67848cc2f3a915ebb8294d98e4328 easy fix with acid/mm regression: https://github.com/abstractdog/hive/commit/8e25b5ce11220e22dbe90958d52c63b52a482931 > Empty files are inserted into external tables after HIVE-21714 > -------------------------------------------------------------- > > Key: HIVE-22941 > URL: https://issues.apache.org/jira/browse/HIVE-22941 > Project: Hive > Issue Type: Improvement > Reporter: László Bodor > Priority: Major > > There were multiple patches targeting an issue when INSERT OVERWRITE was > ineffective if the input is empty: > HIVE-18702: INSERT OVERWRITE TABLE doesn't clean the table directory before > overwriting > HIVE-21714: Insert overwrite on an acid/mm table is ineffective if the input > is empty > HIVE-21784: Insert overwrite on an acid (not mm) table is ineffective if the > input is empty > From these patches, HIVE-21714 seems to have a bad effect on external tables, > because of this part: > https://github.com/apache/hive/commit/9a10bc28bee5250c0f667c94a295706a44ed4d7e#diff-9bea2581a1fba611f2c10904857b8823R1268 > The original issue before HIVE-21714 was that the original files in the table > survived an insert overwrite, and select(*)>0 was after that. HIVE-21714 > seems to enable writing empty files regardless of execution engine / table > type, which is not the proper way, as the proper solution would be to > completely avoid writing empty files for Tez (this is what HIVE-14014 was > about). I found that changing condition to... > {code} > if (!isTez && (isStreaming || this.isInsertOverwrite)) > {code} > (which could be an easy solution for external tables) breaks some test cases > (both full ACID and MM) in insert_overwrite.q, which could mean they rely > somehow on the empty generated file. We need to find a proper solution which > is applicable for all table types without polluting external tables. -- This message was sent by Atlassian Jira (v8.3.4#803005)