Re: Review Request 27745: HIVE-8621 Dump small table join data for map-join [Spark Branch]

Jimmy Xiang Fri, 07 Nov 2014 15:55:20 -0800


> On Nov. 7, 2014, 9:51 p.m., Suhas Satish wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java, line 
> > 314
> > <https://reviews.apache.org/r/27745/diff/1/?file=754765#file754765line314>
> >
> >     What if there are 2 partitions for big table?  I guess they will then 
> > be processed on 2 separate spark nodes, right?  
> >     
> >     So in this case, there are 2 replicas created for this HashTableSink. 
> > How do we control that these 2 replicas will be on the same data nodes as 
> > the ones where the 2 big table partitions will be processing map-joins ?


We can't, if we don't know where the big table partitions are.  If there are 
just two partitions, if we copy the small table to more nodes, it may take more 
time, than fetch the data over network?


- Jimmy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27745/#review60388
-----------------------------------------------------------


On Nov. 7, 2014, 9:34 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27745/
> -----------------------------------------------------------
> 
> (Updated Nov. 7, 2014, 9:34 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8621
>     https://issues.apache.org/jira/browse/HIVE-8621
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In case spark, HashTableSinkOperator should dump files to a folder expected 
> by HashTableLoader.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/HashTableSinkOperator.java 
> f0e04e7 
> 
> Diff: https://reviews.apache.org/r/27745/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>

Re: Review Request 27745: HIVE-8621 Dump small table join data for map-join [Spark Branch]

Reply via email to