[ https://issues.apache.org/jira/browse/HIVE-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ádám Szita reassigned HIVE-24683: --------------------------------- > NPE in Hadoop23Shims due to non-existing delete delta paths > ----------------------------------------------------------- > > Key: HIVE-24683 > URL: https://issues.apache.org/jira/browse/HIVE-24683 > Project: Hive > Issue Type: Bug > Reporter: Ádám Szita > Assignee: Ádám Szita > Priority: Major > > HIVE-23840 introduced the feature of reading delete deltas from LLAP cache if > it's available. This refactor opens an opportunity for NPE to happen: > {code:java} > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.shims.Hadoop23Shims.getFileId(Hadoop23Shims.java:1410) > at org.apache.hadoop.hive.ql.io.HdfsUtils.getFileId(HdfsUtils.java:55) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.determineFileId(OrcEncodedDataReader.java:509) > at > org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.getOrcTailForPath(OrcEncodedDataReader.java:579) > at > org.apache.hadoop.hive.llap.io.api.impl.LlapIoImpl.getOrcTailFromCache(LlapIoImpl.java:322) > at > org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.getOrcTail(VectorizedOrcAcidRowBatchReader.java:683) > at > org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader.access$500(VectorizedOrcAcidRowBatchReader.java:82) > at > org.apache.hadoop.hive.ql.io.orc.VectorizedOrcAcidRowBatchReader$ColumnizedDeleteEventRegistry.<init>(VectorizedOrcAcidRowBatchReader.java:1581){code} > ColumnizedDeleteEventRegistry infers the file name of a delete delta bucket > by looking at the bucket number (from the corresponding split) but this file > may not exist if no deletion happen from that particular bucket. > Earlier this was handled by always trying to open an ORC reader on the path > and catching FileNotFoundException. However in the refactor we first try to > look into the cache, and for that try to retrieve a file ID first. This > entails a getFileStatus call on HDFS which returns null for non-existing > paths, causing the NPE eventually. > This needs to be wrapped around by a null check in Hadoop23Shims.. -- This message was sent by Atlassian Jira (v8.3.4#803005)