[jira] [Updated] (HIVE-26150) OrcRawRecordMerger reads each row twice

Alessandro Solimando (Jira) Tue, 19 Apr 2022 01:08:07 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-26150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alessandro Solimando updated HIVE-26150:
----------------------------------------
    Description: 
OrcRawRecordMerger reads each row twice, the issue does not surface since the 
merger is only used with the parameter "collapseEvents" as true, which filters 
out one of the two rows.

collapseEvents true and false should produce the same result, since in current 
acid implementation, each event has a distinct rowid, so two identical rows 
cannot be there, this is the case only for the bug.

In order to reproduce the issue, it is sufficient to set the second parameter 
to false 
[here|https://github.com/apache/hive/blob/61d4ff2be48b20df9fd24692c372ee9c2606babe/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2103-L2106],
 and run tests in TestOrcRawRecordMerger and observe two tests failing:

{code:bash}
mvn test -Dtest=TestOrcRawRecordMerger -pl ql
{code}

{noformat}
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR]   TestOrcRawRecordMerger.testRecordReaderNewBaseAndDelta:1332 Found 
unexpected row: (0,ignore.1)
[ERROR]   TestOrcRawRecordMerger.testRecordReaderOldBaseAndDelta:1208 Found 
unexpected row: (0,ignore.1)
{noformat}


  was:
OrcRawRecordMerger reads each row twice, the issue does not surface since the 
merger is only used with the parameter "collapseEvents" as true, which filters 
out one of the two rows.

collapseEvents true and false should produce the same result, since in current 
acid implementation, each event has a distinct rowid, so two identical rows 
cannot be there, this is the case only for the bug.

In order to reproduce the issue, it is sufficient to set the second parameter 
to false 
[here|https://github.com/apache/hive/blob/61d4ff2be48b20df9fd24692c372ee9c2606babe/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2103-L2106],
 and run tests in TestOrcRawRecordMerger and observe two tests failing.


> OrcRawRecordMerger reads each row twice
> ---------------------------------------
>
>                 Key: HIVE-26150
>                 URL: https://issues.apache.org/jira/browse/HIVE-26150
>             Project: Hive
>          Issue Type: Bug
>          Components: ORC, Transactions
>    Affects Versions: 4.0.0-alpha-2
>            Reporter: Alessandro Solimando
>            Priority: Major
>
> OrcRawRecordMerger reads each row twice, the issue does not surface since the 
> merger is only used with the parameter "collapseEvents" as true, which 
> filters out one of the two rows.
> collapseEvents true and false should produce the same result, since in 
> current acid implementation, each event has a distinct rowid, so two 
> identical rows cannot be there, this is the case only for the bug.
> In order to reproduce the issue, it is sufficient to set the second parameter 
> to false 
> [here|https://github.com/apache/hive/blob/61d4ff2be48b20df9fd24692c372ee9c2606babe/ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcInputFormat.java#L2103-L2106],
>  and run tests in TestOrcRawRecordMerger and observe two tests failing:
> {code:bash}
> mvn test -Dtest=TestOrcRawRecordMerger -pl ql
> {code}
> {noformat}
> [INFO] Results:
> [INFO]
> [ERROR] Failures:
> [ERROR]   TestOrcRawRecordMerger.testRecordReaderNewBaseAndDelta:1332 Found 
> unexpected row: (0,ignore.1)
> [ERROR]   TestOrcRawRecordMerger.testRecordReaderOldBaseAndDelta:1208 Found 
> unexpected row: (0,ignore.1)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (HIVE-26150) OrcRawRecordMerger reads each row twice

Reply via email to