[ https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Abhishek Somani updated HIVE-20901: ----------------------------------- Attachment: HIVE-20901.2.patch > running compactor when there is nothing to do produces duplicate data > --------------------------------------------------------------------- > > Key: HIVE-20901 > URL: https://issues.apache.org/jira/browse/HIVE-20901 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 4.0.0 > Reporter: Eugene Koifman > Assignee: Abhishek Somani > Priority: Major > Attachments: HIVE-20901.1.patch, HIVE-20901.2.patch > > > suppose we run minor compaction 2 times, via alter table > The 2nd request to compaction should have nothing to do but I don't think > there is a check for that. It's visible in the context of HIVE-20823, where > each compactor run produces a delta with new visibility suffix so we end up > with something like > {noformat} > target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ > ├── delete_delta_0000001_0000002_v0000019 > │ ├── _orc_acid_version > │ └── bucket_00000 > ├── delete_delta_0000001_0000002_v0000021 > │ ├── _orc_acid_version > │ └── bucket_00000 > ├── delta_0000001_0000001_0000 > │ ├── _orc_acid_version > │ └── bucket_00000 > ├── delta_0000001_0000002_v0000019 > │ ├── _orc_acid_version > │ └── bucket_00000 > ├── delta_0000001_0000002_v0000021 > │ ├── _orc_acid_version > │ └── bucket_00000 > └── delta_0000002_0000002_0000 > ├── _orc_acid_version > └── bucket_00000{noformat} > i.e. 2 deltas with the same write ID range > this is bad. Probably happens today as well but new run produces a delta > with the same name and clobbers the previous one, which may interfere with > writers > > need to investigate > > -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both > deltas as if they were distinct and it effectively duplicates data.- There > is no data duplication - {{getAcidState()}} will not use 2 deltas with the > same {{writeid}} range > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)