[ https://issues.apache.org/jira/browse/HIVE-12712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15065140#comment-15065140 ]
Sergey Shelukhin commented on HIVE-12712: ----------------------------------------- +1 pending a test run > HiveInputFormat may fail to column names to read in some cases > -------------------------------------------------------------- > > Key: HIVE-12712 > URL: https://issues.apache.org/jira/browse/HIVE-12712 > Project: Hive > Issue Type: Bug > Affects Versions: 2.0.0, 2.1.0 > Reporter: Takahiko Saito > Assignee: Prasanth Jayachandran > Attachments: HIVE-12712.1.patch, HIVE-12712.2.patch > > > The primary issue is when plan is generated pathToAliases map is populated > with directory paths to table aliases. pathToAliases.put() uses > path.toString() as map key. During probing, path.toUri().toString() is used. > This can cause probe misses when path contains spaces in them. path.toUri() > will escape the spaces in the path whereas path.toString() does not escape > the spaces. As a result, HiveInputFormat can trigger a different code path > which can fail to set list of columns to read from the source table. This was > causing unexpected NPE in OrcInputFormat (after refactoring HIVE-11705) which > removed null check for column names. The resulting exception is > {code} > Caused by: java.lang.RuntimeException: ORC split generation failed with > exception: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1288) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1354) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:367) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:457) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:152) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:246) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:240) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:240) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:227) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ... 3 more > Caused by: java.util.concurrent.ExecutionException: > java.lang.NullPointerException > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1282) > ... 15 more > Caused by: java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.extractNeededColNames(OrcInputFormat.java:422) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.extractNeededColNames(OrcInputFormat.java:417) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.access$2000(OrcInputFormat.java:134) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1072) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:919) > ... 4 more > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)