[jira] [Updated] (HIVE-14366) Conversion of a Non-ACID table to an ACID table produces non-unique primary keys

Saket Saurabh (JIRA) Wed, 27 Jul 2016 19:18:57 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-14366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Saket Saurabh updated HIVE-14366:
---------------------------------
    Description: 
When a Non-ACID table is converted to an ACID table, the primary key consisting 
of (original transaction id, bucket_id, row_id) is not generated uniquely. 
Currently, the row_id is always set to 0 for most rows. This leads to 
correctness issue for such tables.

Quickest way to reproduce is to add the following unit test to 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java

{code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
  @Test
  public void testOriginalReader() throws Exception {
    FileSystem fs = FileSystem.get(hiveConf);
    FileStatus[] status;

    // 1. Insert five rows to Non-ACID table.
    runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) 
values(1,2),(3,4),(5,6),(7,8),(9,10)");

    // 2. Convert NONACIDORCTBL to ACID table.
    runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET 
TBLPROPERTIES ('transactional'='true')");

    // 3. Perform a major compaction.
    runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 
'MAJOR'");
    runWorker(hiveConf);

    // 4. Perform a delete.
    runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 1");

    // 5. Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since 
(1,2) has been deleted.
    List<String> rs = runStatementOnDriver("select a,b from " + 
Table.NONACIDORCTBL + " order by a,b");
    int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}};
    Assert.assertEquals(stringifyValues(resultData), rs);
  }
{code}

  was:
When a Non-ACID table is converted to an ACID table, the primary key consisting 
of (original transaction id, bucket_id, row_id) is not generated uniquely. 
Currently, the row_id is always set to 0 for most rows. This leads to 
correctness issue for such tables.

Quickest way to reproduce is to add the following unit test to 
ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java

{code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
  @Test
  public void testOriginalReader() throws Exception {
    FileSystem fs = FileSystem.get(hiveConf);
    FileStatus[] status;

    // 1. Insert five rows to Non-ACID table.
    runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) 
values(1,2),(3,4),(5,6),(7,8),(9,10)");

    // 2. Convert NONACIDORCTBL to ACID table.
    runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET 
TBLPROPERTIES ('transactional'='true')");

    // 3. Perform a major compaction.
    runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 
'MAJOR'");
    runWorker(hiveConf);

    // 3. Perform a delete.
    runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 1");

    // Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since 
(1,2) has been deleted.
    List<String> rs = runStatementOnDriver("select a,b from " + 
Table.NONACIDORCTBL + " order by a,b");
    int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}};
    Assert.assertEquals(stringifyValues(resultData), rs);
  }
{code}


> Conversion of a Non-ACID table to an ACID table produces non-unique primary 
> keys
> --------------------------------------------------------------------------------
>
>                 Key: HIVE-14366
>                 URL: https://issues.apache.org/jira/browse/HIVE-14366
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Saket Saurabh
>
> When a Non-ACID table is converted to an ACID table, the primary key 
> consisting of (original transaction id, bucket_id, row_id) is not generated 
> uniquely. Currently, the row_id is always set to 0 for most rows. This leads 
> to correctness issue for such tables.
> Quickest way to reproduce is to add the following unit test to 
> ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java
> {code:title=ql/src/test/org/apache/hadoop/hive/ql/TestTxnCommands2.java|borderStyle=solid}
>   @Test
>   public void testOriginalReader() throws Exception {
>     FileSystem fs = FileSystem.get(hiveConf);
>     FileStatus[] status;
>     // 1. Insert five rows to Non-ACID table.
>     runStatementOnDriver("insert into " + Table.NONACIDORCTBL + "(a,b) 
> values(1,2),(3,4),(5,6),(7,8),(9,10)");
>     // 2. Convert NONACIDORCTBL to ACID table.
>     runStatementOnDriver("alter table " + Table.NONACIDORCTBL + " SET 
> TBLPROPERTIES ('transactional'='true')");
>     // 3. Perform a major compaction.
>     runStatementOnDriver("alter table "+ Table.NONACIDORCTBL + " compact 
> 'MAJOR'");
>     runWorker(hiveConf);
>     // 4. Perform a delete.
>     runStatementOnDriver("delete from " + Table.NONACIDORCTBL + " where a = 
> 1");
>     // 5. Now do a projection should have (3,4) (5,6),(7,8),(9,10) only since 
> (1,2) has been deleted.
>     List<String> rs = runStatementOnDriver("select a,b from " + 
> Table.NONACIDORCTBL + " order by a,b");
>     int[][] resultData = new int[][] {{3,4}, {5,6}, {7,8}, {9,10}};
>     Assert.assertEquals(stringifyValues(resultData), rs);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (HIVE-14366) Conversion of a Non-ACID table to an ACID table produces non-unique primary keys

Reply via email to