[ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16038414#comment-16038414 ]
liyunzhang_intel commented on HIVE-11297: ----------------------------------------- [~csun]: we can not do that because GenSparkProcContext#clonedPruningTableScanSet will be sent to topNodes of GenSparkWorkWalker#startWalking. And GenSparkWorkWalker will split tree in min cost. So if topNode is 1, it will split following tree {noformat} TS[1]-FIL[17]- SEL[18] -GBY[19]-SPARKPRUNINGSINK[20] -SEL[21] -GBY[22]-SPARKPRUNINGSINK[23] {noformat} into only 1 tree {noformat} TS[1]-FIL[17]- SEL[18] -GBY[19]-SPARKPRUNINGSINK[20] {noformat} The log of GenSparkWork {code} [root@bdpe41 hive]# grep GenSparkWork logs/hive.log 2017-06-06T16:34:12,527 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Root operator: TS[0] 2017-06-06T16:34:12,527 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Leaf operator: RS[2] 2017-06-06T16:34:19,070 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: First pass. Leaf operator: RS[2] 2017-06-06T16:34:19,070 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Root operator: JOIN[5] 2017-06-06T16:34:19,070 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Leaf operator: RS[9] 2017-06-06T16:34:22,858 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Removing RS[2] as parent from JOIN[5] 2017-06-06T16:34:22,859 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Removing RS[4] as parent from JOIN[5] 2017-06-06T16:34:22,859 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: First pass. Leaf operator: RS[9] 2017-06-06T16:34:22,859 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Root operator: GBY[10] 2017-06-06T16:34:22,859 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Leaf operator: FS[12] 2017-06-06T16:34:27,322 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Removing RS[9] as parent from GBY[10] 2017-06-06T16:34:27,322 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: First pass. Leaf operator: FS[12] 2017-06-06T16:34:27,322 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Root operator: TS[1] 2017-06-06T16:34:27,322 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Leaf operator: RS[4] 2017-06-06T16:36:14,669 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Second pass. Leaf operator: RS[4] has common downstream work:org.apache.hadoop.hive.ql.plan.ReduceWork@7e7f72 2017-06-06T16:36:14,672 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Root operator: TS[1] 2017-06-06T16:36:14,672 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: Leaf operator: SPARKPRUNINGSINK[20] 2017-06-06T16:38:22,338 DEBUG [7e080689-d76b-498f-9a41-d8843a9b199f main] spark.GenSparkWork: First pass. Leaf operator: SPARKPRUNINGSINK[20] {code} > Combine op trees for partition info generating tasks [Spark branch] > ------------------------------------------------------------------- > > Key: HIVE-11297 > URL: https://issues.apache.org/jira/browse/HIVE-11297 > Project: Hive > Issue Type: Bug > Affects Versions: spark-branch > Reporter: Chao Sun > Assignee: liyunzhang_intel > Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch > > > Currently, for dynamic partition pruning in Spark, if a small table generates > partition info for more than one partition columns, multiple operator trees > are created, which all start from the same table scan op, but have different > spark partition pruning sinks. > As an optimization, we can combine these op trees and so don't have to do > table scan multiple times. -- This message was sent by Atlassian JIRA (v6.3.15#6346)