[ 
https://issues.apache.org/jira/browse/HIVE-22941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

László Bodor updated HIVE-22941:
--------------------------------
    Description: 
There were multiple patches targeting an issue when INSERT OVERWRITE was 
ineffective if the input is empty:
HIVE-18702: INSERT OVERWRITE TABLE doesn't clean the table directory before 
overwriting
HIVE-21714: Insert overwrite on an acid/mm table is ineffective if the input is 
empty
HIVE-21784: Insert overwrite on an acid (not mm) table is ineffective if the 
input is empty

>From these patches, HIVE-21714 seems to have a bad effect on external tables, 
>because of this part:
https://github.com/apache/hive/commit/9a10bc28bee5250c0f667c94a295706a44ed4d7e#diff-9bea2581a1fba611f2c10904857b8823R1268

The issue was that the original files in the table survived an insert 
overwrite, and select(*)>0 was after that. HIVE-21714 seems to enable writing 
empty files regardless of execution engine, which is not the proper way, as the 
proper solution would be to completely avoid writing empty files for Tez (this 
is what HIVE-14014 was about). I found that changing condition to...
{code}
if (!isTez && (isStreaming || this.isInsertOverwrite)) 
{code}
(which could be an easy solution for external tables) breaks some test cases 
(both full ACID and MM) in insert_overwrite.q, which could mean they rely 
somehow on the empty generated file. We need to find a proper solution which is 
applicable for all table types without polluting external tables.

  was:
There were multiple patches targeting an issue when INSERT OVERWRITE was 
ineffective if the input is empty:
HIVE-18702: INSERT OVERWRITE TABLE doesn't clean the table directory before 
overwriting
HIVE-21714: Insert overwrite on an acid/mm table is ineffective if the input is 
empty
HIVE-21784: Insert overwrite on an acid (not mm) table is ineffective if the 
input is empty

>From these patches, HIVE-21714 seems to have a bad effect on external tables, 
>because of this part:
https://github.com/apache/hive/commit/9a10bc28bee5250c0f667c94a295706a44ed4d7e#diff-9bea2581a1fba611f2c10904857b8823R1268

The issue was that the original files in the table survived an insert 
overwrite, and select(*)>0 was after that. HIVE-21714 seems to enable writing 
empty files regardless of execution engine, which is not the proper way, as the 
proper solution would be to completely avoid writing empty files for Tez (this 
is what HIVE-14014 was about). I found that changing to logic to...
{code}
if (!isTez && (isStreaming || this.isInsertOverwrite)) 
{code}
(which could be an easy solution for external tables) breaks some test cases 
(both full ACID and MM) in insert_overwrite.q, which could mean they rely 
somehow on the empty generated file. We need to find a proper solution which is 
applicable for all table types without polluting external tables.


> Empty files are inserted into external tables after HIVE-21714
> --------------------------------------------------------------
>
>                 Key: HIVE-22941
>                 URL: https://issues.apache.org/jira/browse/HIVE-22941
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: László Bodor
>            Priority: Major
>
> There were multiple patches targeting an issue when INSERT OVERWRITE was 
> ineffective if the input is empty:
> HIVE-18702: INSERT OVERWRITE TABLE doesn't clean the table directory before 
> overwriting
> HIVE-21714: Insert overwrite on an acid/mm table is ineffective if the input 
> is empty
> HIVE-21784: Insert overwrite on an acid (not mm) table is ineffective if the 
> input is empty
> From these patches, HIVE-21714 seems to have a bad effect on external tables, 
> because of this part:
> https://github.com/apache/hive/commit/9a10bc28bee5250c0f667c94a295706a44ed4d7e#diff-9bea2581a1fba611f2c10904857b8823R1268
> The issue was that the original files in the table survived an insert 
> overwrite, and select(*)>0 was after that. HIVE-21714 seems to enable writing 
> empty files regardless of execution engine, which is not the proper way, as 
> the proper solution would be to completely avoid writing empty files for Tez 
> (this is what HIVE-14014 was about). I found that changing condition to...
> {code}
> if (!isTez && (isStreaming || this.isInsertOverwrite)) 
> {code}
> (which could be an easy solution for external tables) breaks some test cases 
> (both full ACID and MM) in insert_overwrite.q, which could mean they rely 
> somehow on the empty generated file. We need to find a proper solution which 
> is applicable for all table types without polluting external tables.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to