[jira] [Commented] (HIVE-8793) Make sure multi-insert works with map join [Spark Branch]

Xuefu Zhang (JIRA) Wed, 12 Nov 2014 06:46:19 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-8793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208080#comment-14208080
 ]


Xuefu Zhang commented on HIVE-8793:
-----------------------------------

Hi [~lirui], thanks for working on this. The above task graph change is 
expected. However, the only concern is that whether or how spark RDD.cache() is 
utilized. Reducer5 and Reducer6 will have the same input and same shuffle, so 
it's inefficient for them to do the same thing repeatly. HIVE-8118 is able to 
add RDD cache() when SparkPlanGenerator generates the plan. I'm not sure the 
logic is still in place. I will take a look at your patch to understand more on 
this. Thanks.

> Make sure multi-insert works with map join [Spark Branch]
> ---------------------------------------------------------
>
>                 Key: HIVE-8793
>                 URL: https://issues.apache.org/jira/browse/HIVE-8793
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: spark-branch
>            Reporter: Chao
>            Assignee: Rui Li
>         Attachments: HIVE-8793.1-spark.patch, HIVE-8793.2-spark.patch
>
>
> Currently, HIVE-8622 is implemented based on an assumption, that for a map 
> join query, a BaseWork would not have multiple children. By testing through 
> subquery_multiinsert.q did reveal that's the case. But, we need to 
> investigate on this, and make sure this won't happen in general.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8793) Make sure multi-insert works with map join [Spark Branch]

Reply via email to