[ https://issues.apache.org/jira/browse/HIVE-22969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marta Kuczora reassigned HIVE-22969: ------------------------------------ Assignee: (was: Marta Kuczora) > Union remove optimisation results incorrect data when inserting to ACID table > ----------------------------------------------------------------------------- > > Key: HIVE-22969 > URL: https://issues.apache.org/jira/browse/HIVE-22969 > Project: Hive > Issue Type: Bug > Affects Versions: 4.0.0 > Reporter: Marta Kuczora > Priority: Major > > Steps to reproduce the issue: > {noformat} > create table input_text(key string, val string) stored as textfile location > '/Users/martakuczora/work/hive/warehouse/external/input_text'; > create table output_acid(key string, val string) stored as orc > tblproperties('transactional'='true'); > insert into input_text values ('1','1'), ('2','2'),('3','3'); > {noformat} > {noformat} > set hive.mapred.mode=nonstrict; > set hive.stats.autogather=false; > set hive.optimize.union.remove=true; > set hive.auto.convert.join=true; > set hive.exec.submitviachild=false; > set hive.exec.submit.local.task.via.child=false; > SELECT * FROM ( > select key, val from input_text > union all > select a.key as key, b.val as val FROM input_text a join input_text b on > a.key=b.key) c; > The result of the select: > 1 1 > 2 2 > 3 3 > 1 1 > 2 2 > 3 3 > {noformat} > {noformat} > insert into table output_acid > SELECT * FROM ( > select key, val from input_text > union all > select a.key as key, b.val as val FROM input_text a join input_text b on > a.key=b.key) c; > select * from output_acid; > The result: > 1 1 > 2 2 > 3 3 > {noformat} > The folder of the output_acid table contained the following delta directories: > {noformat} > drwxr-xr-x 6 martakuczora staff 192 Mar 2 16:29 delta_0000000_0000000 > drwxr-xr-x 6 martakuczora staff 192 Mar 2 16:29 delta_0000001_0000001_0001 > {noformat} > It can be seen that the statement ID from the first directory is missing and > when the select statements runs on the table, this directory will be ignored. > That's why only half of the data got returned when running the select on the > output_acid table. > If either hive.stats.autogather is set to true or hive.optimize.union.remove > is set to false the result of the insert will be correct. In this case there > will be only 1 delta directory in the table's folder. -- This message was sent by Atlassian Jira (v8.3.4#803005)