[ https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15090488#comment-15090488 ]
Hive QA commented on HIVE-12724: -------------------------------- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12781021/HIVE-12724.3.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 9985 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles org.apache.hive.spark.client.TestSparkClient.testCounters org.apache.hive.spark.client.TestSparkClient.testErrorJob org.apache.hive.spark.client.TestSparkClient.testJobSubmission org.apache.hive.spark.client.TestSparkClient.testMetricsCollection org.apache.hive.spark.client.TestSparkClient.testRemoteClient org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob org.apache.hive.spark.client.TestSparkClient.testSyncRpc {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6558/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6558/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6558/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12781021 - PreCommit-HIVE-TRUNK-Build > ACID: Major compaction fails to include the original bucket files into MR job > ----------------------------------------------------------------------------- > > Key: HIVE-12724 > URL: https://issues.apache.org/jira/browse/HIVE-12724 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 2.0.0, 2.1.0 > Reporter: Wei Zheng > Assignee: Wei Zheng > Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, > HIVE-12724.3.patch > > > How the problem happens: > * Create a non-ACID table > * Before non-ACID to ACID table conversion, we inserted row one > * After non-ACID to ACID table conversion, we inserted row two > * Both rows can be retrieved before MAJOR compaction > * After MAJOR compaction, row one is lost > {code} > hive> USE acidtest; > OK > Time taken: 0.77 seconds > hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment > STRING) > > CLUSTERED BY (regionkey) INTO 2 BUCKETS > > STORED AS ORC; > OK > Time taken: 0.179 seconds > hive> DESC FORMATTED t1; > OK > # col_name data_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner: wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention: 0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1450137040 > # Storage Information > SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > Compressed: No > Num Buckets: 2 > Bucket Columns: [regionkey] > Sort Columns: [] > Storage Desc Params: > serialization.format 1 > Time taken: 0.198 seconds, Fetched: 28 row(s) > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; > Found 1 items > drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapreduce.job.reduces=<number> > Job running in-process (local Hadoop) > 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local73977356_0001 > Loading data to table acidtest.t1 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 2.825 seconds > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 2 items > -rwxr-xr-x 1 wzheng staff 112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000000_0 > -rwxr-xr-x 1 wzheng staff 472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000001_0 > hive> SELECT * FROM t1; > OK > 1 USA 1 united states > Time taken: 0.434 seconds, Fetched: 1 row(s) > hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); > OK > Time taken: 0.071 seconds > hive> DESC FORMATTED t1; > OK > # col_name data_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner: wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention: 0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE false > last_modified_by wzheng > last_modified_time 1450137141 > numFiles 2 > numRows -1 > rawDataSize -1 > totalSize 584 > transactional true > transient_lastDdlTime 1450137141 > # Storage Information > SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > Compressed: No > Num Buckets: 2 > Bucket Columns: [regionkey] > Sort Columns: [] > Storage Desc Params: > serialization.format 1 > Time taken: 0.049 seconds, Fetched: 36 row(s) > hive> set hive.support.concurrency=true; > hive> set hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; > hive> set hive.compactor.initiator.on=true; > hive> set hive.compactor.worker.threads=5; > hive> set hive.exec.dynamic.partition.mode=nonstrict; > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 2 items > -rwxr-xr-x 1 wzheng staff 112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000000_0 > -rwxr-xr-x 1 wzheng staff 472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000001_0 > hive> INSERT INTO TABLE t1 VALUES (2, 'Canada', 1, 'maple leaf'); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapreduce.job.reduces=<number> > Job running in-process (local Hadoop) > 2015-12-14 15:54:18,943 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local1674014367_0002 > Loading data to table acidtest.t1 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 1.995 seconds > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 3 items > -rwxr-xr-x 1 wzheng staff 112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000000_0 > -rwxr-xr-x 1 wzheng staff 472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000001_0 > drwxr-xr-x - wzheng staff 204 2015-12-14 15:54 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/delta_0000007_0000007_0000 > hive> dfs -ls > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/delta_0000007_0000007_0000; > Found 2 items > -rw-r--r-- 1 wzheng staff 214 2015-12-14 15:54 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/delta_0000007_0000007_0000/bucket_00000 > -rw-r--r-- 1 wzheng staff 797 2015-12-14 15:54 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/delta_0000007_0000007_0000/bucket_00001 > hive> SELECT * FROM t1; > OK > 1 USA 1 united states > 2 Canada 1 maple leaf > Time taken: 0.1 seconds, Fetched: 2 row(s) > hive> ALTER TABLE t1 COMPACT 'MAJOR'; > Compaction enqueued. > OK > Time taken: 0.026 seconds > hive> show compactions; > OK > Database Table Partition Type State Worker Start Time > Time taken: 0.022 seconds, Fetched: 1 row(s) > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/; > Found 3 items > -rwxr-xr-x 1 wzheng staff 112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000000_0 > -rwxr-xr-x 1 wzheng staff 472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/000001_0 > drwxr-xr-x - wzheng staff 204 2015-12-14 15:55 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/base_0000007 > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/base_0000007; > Found 2 items > -rw-r--r-- 1 wzheng staff 222 2015-12-14 15:55 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/base_0000007/bucket_00000 > -rw-r--r-- 1 wzheng staff 802 2015-12-14 15:55 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/base_0000007/bucket_00001 > hive> select * from t1; > OK > 2 Canada 1 maple leaf > Time taken: 0.396 seconds, Fetched: 1 row(s) > hive> select count(*) from t1; > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 1 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer=<number> > In order to limit the maximum number of reducers: > set hive.exec.reducers.max=<number> > In order to set a constant number of reducers: > set mapreduce.job.reduces=<number> > Job running in-process (local Hadoop) > 2015-12-14 15:56:20,277 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local1720993786_0003 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > 1 > Time taken: 1.623 seconds, Fetched: 1 row(s) > {code} > Note, the cleanup doesn't kick in because the compaction fails already. The > cleanup itself doesn't have any problem (at least not that we know of for > this case). -- This message was sent by Atlassian JIRA (v6.3.4#6332)