[ 
https://issues.apache.org/jira/browse/HIVE-7503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-7503:
------------------------------

    Assignee:     (was: Xuefu Zhang)

Unassigned it from me, as I will not be able to work on this in the next few 
weeks. (I will be in the discussions, though.) Please feel free to take it if 
anyone likes to work on this.

> Support Hive's multi-table insert query with Spark
> --------------------------------------------------
>
>                 Key: HIVE-7503
>                 URL: https://issues.apache.org/jira/browse/HIVE-7503
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>
> For Hive's multi insert query 
> (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML), there 
> may be an MR job for each insert.  When we achieve this with Spark, it would 
> be nice if all the inserts can happen concurrently.
> It seems that this functionality isn't available in Spark. To make things 
> worse, the source of the insert may be re-computed unless it's staged. Even 
> with this, the inserts will happen sequentially, making the performance 
> suffer.
> This task is to find out what takes in Spark to enable this without requiring 
> staging the source and sequential insertion. If this has to be solved in 
> Hive, find out an optimum way to do this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to