Re: How to use multi thread in RDD map function ?

myasuka Mon, 29 Sep 2014 06:07:16 -0700

Our cluster is a standalone cluster with 16 computing nodes, each node has 16
cores. I set SPARK_WORKER_INSTANCES to 1, and set SPARK_WORKER_CORES to 32,
we give 512 tasks all together, this situation can help increase the
concurrency. But if I  set SPARK_WORKER_INSTANCES to 2, SPARK_WORKER_CORES
to 16, this dosen't work well.


Thank you for your reply.


Yi Tian wrote
> for yarn-client mode:
>  
> SPARK_EXECUTOR_CORES * SPARK_EXECUTOR_INSTANCES = 2(or 3) *
> TotalCoresOnYourCluster
> 
> for standlone mode:
> 
> SPARK_WORKER_INSTANCES * SPARK_WORKER_CORES = 2(or 3) *
> TotalCoresOnYourCluster
> 
> 
> 
> Best Regards,
> 
> Yi Tian

> tianyi.asiainfo@

> 
> 
> 
> 
> On Sep 28, 2014, at 17:59, myasuka &lt;

> myasuka@

> &gt; wrote:
> 
>> Hi, everyone
>>    I come across with a problem about increasing the concurency. In a
>> program, after shuffle write, each node should fetch 16 pair matrices to
>> do
>> matrix multiplication. such as:
>> 
>> *import breeze.linalg.{DenseMatrix => BDM}
>> 
>> pairs.map(t => {
>>        val b1 = t._2._1.asInstanceOf[BDM[Double]]
>>        val b2 = t._2._2.asInstanceOf[BDM[Double]]
>> 
>>        val c = (b1 * b2).asInstanceOf[BDM[Double]]
>> 
>>        (new BlockID(t._1.row, t._1.column), c)
>>      })*
>> 
>>    Each node has 16 cores. However, no matter I set 16 tasks or more on
>> each node, the concurrency cannot be higher than 60%, which means not
>> every
>> core on the node is computing. Then I check the running log on the WebUI,
>> according to the amount of shuffle read and write in every task, I see
>> some
>> task do once matrix multiplication, some do twice while some do none.
>> 
>>    Thus, I think of using java multi thread to increase the concurrency.
>> I
>> wrote a program in scala which calls java multi thread without Spark on a
>> single node, by watch the 'top' monitor, I find this program can use CPU
>> up
>> to 1500% ( means nearly every core are computing). But I have no idea how
>> to
>> use Java multi thread in RDD transformation.
>> 
>>    Is there any one can provide some example code to use Java multi
>> thread
>> in RDD transformation, or give any idea to increase the concurrency ?
>> 
>> Thanks for all
>> 
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp8583.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: 

> dev-unsubscribe@.apache

>> For additional commands, e-mail: 

> dev-help@.apache

>>





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp8583p8594.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: How to use multi thread in RDD map function ?

Reply via email to