[ https://issues.apache.org/jira/browse/HIVE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208080#comment-14208080 ]
Xuefu Zhang commented on HIVE-8793: ----------------------------------- Hi [~lirui], thanks for working on this. The above task graph change is expected. However, the only concern is that whether or how spark RDD.cache() is utilized. Reducer5 and Reducer6 will have the same input and same shuffle, so it's inefficient for them to do the same thing repeatly. HIVE-8118 is able to add RDD cache() when SparkPlanGenerator generates the plan. I'm not sure the logic is still in place. I will take a look at your patch to understand more on this. Thanks. > Make sure multi-insert works with map join [Spark Branch] > --------------------------------------------------------- > > Key: HIVE-8793 > URL: https://issues.apache.org/jira/browse/HIVE-8793 > Project: Hive > Issue Type: Sub-task > Components: Spark > Affects Versions: spark-branch > Reporter: Chao > Assignee: Rui Li > Attachments: HIVE-8793.1-spark.patch, HIVE-8793.2-spark.patch > > > Currently, HIVE-8622 is implemented based on an assumption, that for a map > join query, a BaseWork would not have multiple children. By testing through > subquery_multiinsert.q did reveal that's the case. But, we need to > investigate on this, and make sure this won't happen in general. -- This message was sent by Atlassian JIRA (v6.3.4#6332)