[jira] [Updated] (HIVE-24683) Hadoop23Shims getFileId prone to NPE for non-existing paths

Jira Tue, 26 Jan 2021 02:17:45 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ádám Szita updated HIVE-24683:
------------------------------
    Description: 
HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if 
it's available. This refactor opens an opportunity for NPE to happen:
{code:java}
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
at 
org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.<init>(VectorizedOrcAcidRowBatchReader.java:1581){code}
ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket by 
looking at the bucket number (from the corresponding split) but this file may 
not exist if no deletion happen from that particular bucket.

Earlier this was handled by always trying to open an ORC reader on the path and 
catching FileNotFoundException. However in the refactor we first try to look 
into the cache, and for that try to retrieve a file ID first. This entails a 
getFileStatus call on HDFS which returns null for non-existing paths, causing 
the NPE eventually.

This was later fixed by HIVE-23956, nevertheless Hadoop23Shims.getFileId should 
be refactored in a way that it's not error prone anymore.

  was:
HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if 
it's available. This refactor opens an opportunity for NPE to happen:
{code:java}
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
at 
org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
at 
org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
at 
org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.<init>(VectorizedOrcAcidRowBatchReader.java:1581){code}
ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket by 
looking at the bucket number (from the corresponding split) but this file may 
not exist if no deletion happen from that particular bucket.

Earlier this was handled by always trying to open an ORC reader on the path and 
catching FileNotFoundException. However in the refactor we first try to look 
into the cache, and for that try to retrieve a file ID first. This entails a 
getFileStatus call on HDFS which returns null for non-existing paths, causing 
the NPE eventually.

This needs to be wrapped around by a null check in Hadoop23Shims..


> Hadoop23Shims getFileId prone to NPE for non-existing paths
> -----------------------------------------------------------
>
>                 Key: HIVE-24683
>                 URL: https://issues.apache.org/jira/browse/HIVE-24683
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>
> HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if 
> it's available. This refactor opens an opportunity for NPE to happen:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
> at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.<init>(VectorizedOrcAcidRowBatchReader.java:1581){code}
> ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket 
> by looking at the bucket number (from the corresponding split) but this file 
> may not exist if no deletion happen from that particular bucket.
> Earlier this was handled by always trying to open an ORC reader on the path 
> and catching FileNotFoundException. However in the refactor we first try to 
> look into the cache, and for that try to retrieve a file ID first. This 
> entails a getFileStatus call on HDFS which returns null for non-existing 
> paths, causing the NPE eventually.
> This was later fixed by HIVE-23956, nevertheless Hadoop23Shims.getFileId 
> should be refactored in a way that it's not error prone anymore.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HIVE-24683) Hadoop23Shims getFileId prone to NPE for non-existing paths

Reply via email to