[ https://issues.apache.org/jira/browse/HIVE-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Saket Saurabh updated HIVE-14366: --------------------------------- Description: When a Non-ACID table is converted to an ACID table, the primary key consisting of (original transaction id, bucket_id, row_id) is not generated uniquely. Currently, the row_id is always set to 0 for most rows. This leads to correctness issue for such tables. Quickest way to reproduce is to add the following unit test to ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} @Test public void testOriginalReader() throws Exception { FileSystem fs = FileSystem.get(hiveConf); FileStatus[] status; // 1. Insert five rows to Non-ACID table. runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) values(1,2),(3,4),(5,6),(7,8),(9,10)"); // 2. Convert NONACIDORCTBL to ACID table. runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET TBLPROPERTIES ('transactional'='true')"); // 3. Perform a major compaction. runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 'MAJOR'"); runWorker(hiveConf); // 4. Perform a delete. runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 1"); // 5. Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since (1,2) has been deleted. List<String> rs = runStatementOnDriver("select a,b from " + Table.NONACIDORCTBL + " order by a,b"); int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}}; Assert.assertEquals(stringifyValues(resultData), rs); } {code} was: When a Non-ACID table is converted to an ACID table, the primary key consisting of (original transaction id, bucket_id, row_id) is not generated uniquely. Currently, the row_id is always set to 0 for most rows. This leads to correctness issue for such tables. Quickest way to reproduce is to add the following unit test to ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} @Test public void testOriginalReader() throws Exception { FileSystem fs = FileSystem.get(hiveConf); FileStatus[] status; // 1. Insert five rows to Non-ACID table. runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) values(1,2),(3,4),(5,6),(7,8),(9,10)"); // 2. Convert NONACIDORCTBL to ACID table. runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET TBLPROPERTIES ('transactional'='true')"); // 3. Perform a major compaction. runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 'MAJOR'"); runWorker(hiveConf); // 3. Perform a delete. runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 1"); // Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since (1,2) has been deleted. List<String> rs = runStatementOnDriver("select a,b from " + Table.NONACIDORCTBL + " order by a,b"); int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}}; Assert.assertEquals(stringifyValues(resultData), rs); } {code} > Conversion of a Non-ACID table to an ACID table produces non-unique primary > keys > -------------------------------------------------------------------------------- > > Key: HIVE-14366 > URL: https://issues.apache.org/jira/browse/HIVE-14366 > Project: Hive > Issue Type: Bug > Components: Transactions > Reporter: Saket Saurabh > > When a Non-ACID table is converted to an ACID table, the primary key > consisting of (original transaction id, bucket_id, row_id) is not generated > uniquely. Currently, the row_id is always set to 0 for most rows. This leads > to correctness issue for such tables. > Quickest way to reproduce is to add the following unit test to > ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java > {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid} > @Test > public void testOriginalReader() throws Exception { > FileSystem fs = FileSystem.get(hiveConf); > FileStatus[] status; > // 1. Insert five rows to Non-ACID table. > runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) > values(1,2),(3,4),(5,6),(7,8),(9,10)"); > // 2. Convert NONACIDORCTBL to ACID table. > runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET > TBLPROPERTIES ('transactional'='true')"); > // 3. Perform a major compaction. > runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact > 'MAJOR'"); > runWorker(hiveConf); > // 4. Perform a delete. > runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = > 1"); > // 5. Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since > (1,2) has been deleted. > List<String> rs = runStatementOnDriver("select a,b from " + > Table.NONACIDORCTBL + " order by a,b"); > int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}}; > Assert.assertEquals(stringifyValues(resultData), rs); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)