[ https://issues.apache.org/jira/browse/HIVE-26147?focusedWorklogId=758534&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-758534 ]
ASF GitHub Bot logged work on HIVE-26147: ----------------------------------------- Author: ASF GitHub Bot Created on: 19/Apr/22 15:07 Start Date: 19/Apr/22 15:07 Worklog Time Spent: 10m Work Description: klcopp merged PR #3219: URL: https://github.com/apache/hive/pull/3219 Issue Time Tracking ------------------- Worklog Id: (was: 758534) Time Spent: 0.5h (was: 20m) > OrcRawRecordMerger throws NPE when hive.acid.key.index is missing for an acid > file > ---------------------------------------------------------------------------------- > > Key: HIVE-26147 > URL: https://issues.apache.org/jira/browse/HIVE-26147 > Project: Hive > Issue Type: Bug > Components: ORC, Transactions > Affects Versions: 4.0.0-alpha-2 > Reporter: Alessandro Solimando > Assignee: Alessandro Solimando > Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > When _hive.acid.key.index_ is missing for an acid ORC file > _OrcRawRecordMerger_ throws as follows: > {noformat} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.discoverKeyBounds(OrcRawRecordMerger.java:795) > ~[hive-exec-4.0.0-alpha-2-SNAPS > HOT.jar:4.0.0-alpha-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.orc.OrcRawRecordMerger.<init>(OrcRawRecordMerger.java:1053) > ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4. > 0.0-alpha-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:2096) > ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-a > lpha-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1991) > ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4 > .0.0-alpha-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:769) > ~[hive-exec-4.0.0-alpha > -2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:335) > ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0- > alpha-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:560) > ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha > -2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:529) > ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2- > SNAPSHOT] > at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:150) > ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.Driver.getFetchingTableResults(Driver.java:719) > ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNA > PSHOT] > at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:671) > ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha-2-SNAPSHOT] > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.getResults(ReExecDriver.java:233) > ~[hive-exec-4.0.0-alpha-2-SNAPSHOT.jar:4.0.0-alpha > -2-SNAPSHOT] > at > org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:489) > ~[hive-service-4.0.0-alpha-2-SNAPSHOT.jar: > 4.0.0-alpha-2-SNAPSHOT] > ... 24 more > {noformat} > For this situation to happen, the ORC file must have more than one stripe, > and the offset of the element to seek should either locate it beyond the > first stripe (but before the last one), or in the first one if not the last > one, as the code shows: > {code:java} > if (firstStripe != 0) { > minKey = keyIndex[firstStripe - 1]; > } > if (!isTail) { > maxKey = keyIndex[firstStripe + stripeCount - 1]; > } > {code} > However, in the context of the detection of the original issue, the NPE was > triggered even by a simple "select *" over a table with ORC files missing the > _hive.acid.key.index_ metadata information, but it was never failing for ORC > files with a single stripe. The file was generated after a major compaction > of acid and non-acid data. > If the "select *" is not triggering the NPE, either pick the values of the > row obtained with "select * from $table limit 1", or try to select based on > different values trying to get into the sought situation with a filter like > this: > {code:sql} > select * from $table where c = $value > {code} > _OrcRawRecordMerger_ should simply leave as "null" the min and max keys when > the _hive.acid.key.index_ metadata is missing. -- This message was sent by Atlassian Jira (v8.20.7#820007)