[ https://issues.apache.org/jira/browse/HIVE-18148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288814#comment-16288814 ]
liyunzhang commented on HIVE-18148: ----------------------------------- [~lirui]: tried to use the example you provided. {code} set hive.spark.dynamic.partition.pruning=true; explain select * from src join part1 on src.key=part1.p join part2 on src.value=part2.q; {code} But in my env( latest build: 095e6bf8988a03875bc9340b2ab82d5d13c4cb3c), the physical plan before SparkCompiler#removeNestedDPP is {code} TS[0]-FIL[22]-SEL[2]-RS[9]-MAPJOIN[32]-MAPJOIN[31]-SEL[15]-FS[16] TS[3]-FIL[23]-SEL[5]-MAPJOIN[32] TS[6]-FIL[24]-SEL[8]-RS[13]-MAPJOIN[31] {code} there is no dpp operator for removeNestedDPP to traverse because HIVE-17087. SparkMapJoinOptimizer#convertJoinMapJoin removes dpp operator when there is mapjoin operator. So did you reproduce the NPE when setting {{hive.auto.convert.join.noconditionaltask}} as false? BTW, how does Hive on Tez deal with this kind of nested dpp case? > NPE in SparkDynamicPartitionPruningResolver > ------------------------------------------- > > Key: HIVE-18148 > URL: https://issues.apache.org/jira/browse/HIVE-18148 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Rui Li > Assignee: Rui Li > Attachments: HIVE-18148.1.patch > > > The stack trace is: > {noformat} > 2017-11-27T10:32:38,752 ERROR [e6c8aab5-ddd2-461d-b185-a7597c3e7519 main] > ql.Driver: FAILED: NullPointerException null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver$SparkDynamicPartitionPruningDispatcher.dispatch(SparkDynamicPartitionPruningResolver.java:100) > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125) > at > org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver.resolve(SparkDynamicPartitionPruningResolver.java:74) > at > org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeTaskPlan(SparkCompiler.java:568) > {noformat} > At this stage, there shouldn't be a DPP sink whose target map work is null. > The root cause seems to be a malformed operator tree generated by > SplitOpTreeForDPP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)