[ https://issues.apache.org/jira/browse/HIVE-22661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ádám Szita updated HIVE-22661: ------------------------------ Fix Version/s: 4.0.0 Resolution: Fixed Status: Resolved (was: Patch Available) Committed to master. Thanks for reviewing Laszlo and Peter. > Compaction fails on non bucketed table with data loaded inpath > -------------------------------------------------------------- > > Key: HIVE-22661 > URL: https://issues.apache.org/jira/browse/HIVE-22661 > Project: Hive > Issue Type: Bug > Reporter: Ádám Szita > Assignee: Ádám Szita > Priority: Major > Fix For: 4.0.0 > > Attachments: HIVE-22661.0.patch, HIVE-22661.1.patch, > HIVE-22661.2.patch > > > Compaction cannot handle situations where: > * data was ingested with {{LOAD DATA INPATH}} > * this ingest method is run multiple times, and > ** with different number of files getting created in the delta directories > Therefore, for file/dir structures such as: > {code:java} > /warehouse/tablespace/managed/hive/comp3/delta_0000001_0000001_0000 > /warehouse/tablespace/managed/hive/comp3/delta_0000001_0000001_0000/000000_0 > /warehouse/tablespace/managed/hive/comp3/delta_0000001_0000001_0000/000001_0 > /warehouse/tablespace/managed/hive/comp3/delta_0000002_0000002_0000 > /warehouse/tablespace/managed/hive/comp3/delta_0000002_0000002_0000/000000_0 > /warehouse/tablespace/managed/hive/comp3/delta_0000002_0000002_0000/000001_0 > /warehouse/tablespace/managed/hive/comp3/delta_0000002_0000002_0000/000002_0 > {code} > Although the table is not bucketed, bucket is calculated from the (raw) > files' names. Compaction in the above case will fail on delta1-1 not having > data for 'bucket' 2. > Steps to repro using small dataset: > {code:java} > set tez.grouping.min-size=8; > set tez.grouping.max-size=8; > set mapreduce.input.fileinputformat.split.minsize=8; > set mapreduce.input.fileinputformat.split.minsize=8; > create external table comp0 (a string); > insert into comp0 values ("qwertyuiopasdfghjklzxcvbnm"); > insert into comp0 values ("qwertyuiopasdfghjklzxcvbnm"); > create external table comp1 stored as orc as select * from comp0; > insert into comp0 values ("qwertyuiopasdfghjklzxcvbnm"); > create external table comp2 stored as orc as select * from comp0; > create table comp3 (a string); > load data inpath '/warehouse/tablespace/external/hive/comp1' into table comp3; > load data inpath '/warehouse/tablespace/external/hive/comp2' into table > comp3;{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)