Re: Review Request 27687: HIVE-8649 Increase level of parallelism in reduce phase [Spark Branch]

Xuefu Zhang Thu, 06 Nov 2014 12:28:31 -0800


> On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 75
> > <https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line75>
> >
> >     I don't feel we need to cache this, as this can change during a user 
> > session.
> 
> Jimmy Xiang wrote:
>     Yes, it will change during a user session. I was thinking to update this 
> when things are changed base on some event callbacks.
>     
>     Such info may be needed many times if there are many reducers. It should 
> save us some time to go to the Spark master (assuming getExecutorMemoryStatus 
> checking with the master).


1. I don't think there will be a callback.
2. Yeah, it will be called many times if there are multiple reducers. 
Therefore, it probably makes sense to put the info at 
SetSparkReducerParallelism, which is created for each query.
3. You also need to make sure this works for Spark standalone cluster. I'm not 
sure if you can get number of exectors/memory in the same way.


> On Nov. 6, 2014, 7:01 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java, line 89
> > <https://reviews.apache.org/r/27687/diff/1/?file=751768#file751768line89>
> >
> >     I'm not sure why this needs to be synchronized. Will this method be 
> > called by concurrent threads? It doesn't seem to be the case.
> 
> Jimmy Xiang wrote:
>     Are you saying it won't be called by many threads? Each JVM can run one 
> query at a time during all deployment modes? How come SparkClient.getInstance 
> is synchronized?

Yeah. Right now this is a little messy. Changes are coming. Concurrency isn't 
tested yet. It's fine to leave the synchronization there.


- Xuefu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27687/#review60210
-----------------------------------------------------------


On Nov. 6, 2014, 5:25 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27687/
> -----------------------------------------------------------
> 
> (Updated Nov. 6, 2014, 5:25 p.m.)
> 
> 
> Review request for hive and Xuefu Zhang.
> 
> 
> Bugs: HIVE-8649
>     https://issues.apache.org/jira/browse/HIVE-8649
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> First patch for HIVE-8649, to increase the number of reducers for spark based 
> on some info about the spark cluster.
> We need to add a SparkListener to handle cluster status change if such events 
> are supported by spark.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkClient.java 5766787 
>   
> ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SetSparkReducerParallelism.java
>  2dbb5a3 
> 
> Diff: https://reviews.apache.org/r/27687/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>

Re: Review Request 27687: HIVE-8649 Increase level of parallelism in reduce phase [Spark Branch]

Reply via email to