[ https://issues.apache.org/jira/browse/HIVE-18148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16297875#comment-16297875 ]
liyunzhang commented on HIVE-18148: ----------------------------------- sorry for reply late. still have 1 question about the code {code} 621 /** For DPP sinks w/ common join, we'll split the tree and what's above the branching 622 * operator is computed multiple times. Therefore it may not be good for performance to support 623 * nested DPP sinks, i.e. one DPP sink depends on other DPP sinks. 624 * The following is an example: 625 * 626 * TS TS 627 * | | 628 * ... FIL 629 * | | \ 630 * RS RS SEL 631 * \ / | 632 * TS JOIN GBY 633 * | / \ | 634 * RS RS SEL DPP2 635 * \ / | 636 * JOIN GBY 637 * | 638 * DPP1 639 * 640 * where DPP1 depends on DPP2. 641 * 642 * To avoid such case, we'll visit all the branching operators. If a branching operator has any 643 * further away DPP branches w/ common join in its sub-tree, such branches will be removed. 644 * In the above example, the branch of DPP1 will be removed. 645 */ {code} this function will first collect the branching operators(FIL,JOIN in above example). then remove the nested DPP in the branches. If first traverses FIL, then remove DPP1 , If first tranverses JOIN, then remove DPP2. This function will randomly remove one of nested DPPs. Here I am confused how to judge which dpp need to be removed? If my understanding is not right, tell me. > NPE in SparkDynamicPartitionPruningResolver > ------------------------------------------- > > Key: HIVE-18148 > URL: https://issues.apache.org/jira/browse/HIVE-18148 > Project: Hive > Issue Type: Bug > Components: Spark > Reporter: Rui Li > Assignee: Rui Li > Attachments: HIVE-18148.1.patch, HIVE-18148.2.patch > > > The stack trace is: > {noformat} > 2017-11-27T10:32:38,752 ERROR [e6c8aab5-ddd2-461d-b185-a7597c3e7519 main] > ql.Driver: FAILED: NullPointerException null > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver$SparkDynamicPartitionPruningDispatcher.dispatch(SparkDynamicPartitionPruningResolver.java:100) > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:180) > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:125) > at > org.apache.hadoop.hive.ql.optimizer.physical.SparkDynamicPartitionPruningResolver.resolve(SparkDynamicPartitionPruningResolver.java:74) > at > org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeTaskPlan(SparkCompiler.java:568) > {noformat} > At this stage, there shouldn't be a DPP sink whose target map work is null. > The root cause seems to be a malformed operator tree generated by > SplitOpTreeForDPP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)