[ 
https://issues.apache.org/jira/browse/HIVE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ádám Szita reassigned HIVE-24683:
---------------------------------


> NPE in Hadoop23Shims due to non-existing delete delta paths
> -----------------------------------------------------------
>
>                 Key: HIVE-24683
>                 URL: https://issues.apache.org/jira/browse/HIVE-24683
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>
> HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if 
> it's available. This refactor opens an opportunity for NPE to happen:
> {code:java}
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410)
> at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82)
> at 
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.<init>(VectorizedOrcAcidRowBatchReader.java:1581){code}
> ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket 
> by looking at the bucket number (from the corresponding split) but this file 
> may not exist if no deletion happen from that particular bucket.
> Earlier this was handled by always trying to open an ORC reader on the path 
> and catching FileNotFoundException. However in the refactor we first try to 
> look into the cache, and for that try to retrieve a file ID first. This 
> entails a getFileStatus call on HDFS which returns null for non-existing 
> paths, causing the NPE eventually.
> This needs to be wrapped around by a null check in Hadoop23Shims..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to