[jira] [Commented] (HIVE-16046) Broadcasting small table for Hive on Spark

Rui Li (JIRA) Mon, 27 Feb 2017 01:46:30 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-16046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15885458#comment-15885458
 ]


Rui Li commented on HIVE-16046:
-------------------------------

Details why we didn't choose broadcast for map join can be found in HIVE-7613. 
But I agree we may want to revisit this.

> Broadcasting small table for Hive on Spark
> ------------------------------------------
>
>                 Key: HIVE-16046
>                 URL: https://issues.apache.org/jira/browse/HIVE-16046
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>
> currently the spark plan is 
> {code}
> 1. TS(Small table)->Sel/Fil->HashTableSink  
>                                    
> 2. TS(Small table)->Sel/Fil->HashTableSink          
>                                                                               
>                                          
> 3.                                             HashTableDummy --
>                                                                 |
>                                                 HashTableDummy  --
>                                                                 |
>                                 RootTS(Big table) ->Sel/Fil ->MapJoin 
> -->Sel/Fil ->FileSink
> {code}
>       1.   Run the smalltable SparkWorks on Spark cluster, which dump to 
> hashmap file
>       2.    Run the SparkWork for the big table on Spark cluster.  Mappers 
> will lookup the smalltable hashmap from the file using HashTableDummy’s 
> loader. 
> The disadvantage of current implementation is it need long time to distribute 
> cache the hash table if the hash table is large.  Here want to use 
> sparkContext.broadcast() to store small table although it will keep the 
> broadcast variable in driver and bring some performance decline on driver.
> [~Fred], [~xuefuz], [~lirui] and [~csun], please give some suggestions on it. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (HIVE-16046) Broadcasting small table for Hive on Spark

Reply via email to