[ https://issues.apache.org/jira/browse/HIVE-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16810933#comment-16810933 ]
Hive QA commented on HIVE-20901: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12964935/HIVE-20901.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 15893 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.ql.txn.compactor.TestWorker.minorWithOpenInMiddle (batchId=297) org.apache.hadoop.hive.ql.txn.compactor.TestWorker2.minorWithOpenInMiddle (batchId=296) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/16861/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/16861/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-16861/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.YetusPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12964935 - PreCommit-HIVE-Build > running compactor when there is nothing to do produces duplicate data > --------------------------------------------------------------------- > > Key: HIVE-20901 > URL: https://issues.apache.org/jira/browse/HIVE-20901 > Project: Hive > Issue Type: Bug > Components: Transactions > Affects Versions: 4.0.0 > Reporter: Eugene Koifman > Assignee: Abhishek Somani > Priority: Major > Attachments: HIVE-20901.1.patch > > > suppose we run minor compaction 2 times, via alter table > The 2nd request to compaction should have nothing to do but I don't think > there is a check for that. It's visible in the context of HIVE-20823, where > each compactor run produces a delta with new visibility suffix so we end up > with something like > {noformat} > target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands3-1541810844849/warehouse/t/ > ├── delete_delta_0000001_0000002_v0000019 > │ ├── _orc_acid_version > │ └── bucket_00000 > ├── delete_delta_0000001_0000002_v0000021 > │ ├── _orc_acid_version > │ └── bucket_00000 > ├── delta_0000001_0000001_0000 > │ ├── _orc_acid_version > │ └── bucket_00000 > ├── delta_0000001_0000002_v0000019 > │ ├── _orc_acid_version > │ └── bucket_00000 > ├── delta_0000001_0000002_v0000021 > │ ├── _orc_acid_version > │ └── bucket_00000 > └── delta_0000002_0000002_0000 > ├── _orc_acid_version > └── bucket_00000{noformat} > i.e. 2 deltas with the same write ID range > this is bad. Probably happens today as well but new run produces a delta > with the same name and clobbers the previous one, which may interfere with > writers > > need to investigate > > -The issue (I think) is that {{AcidUtils.getAcidState()}} then returns both > deltas as if they were distinct and it effectively duplicates data.- There > is no data duplication - {{getAcidState()}} will not use 2 deltas with the > same {{writeid}} range > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)