Hi, everyone I come across with a problem about increasing the concurency. In a program, after shuffle write, each node should fetch 16 pair matrices to do matrix multiplication. such as:
*import breeze.linalg.{DenseMatrix => BDM} pairs.map(t => { val b1 = t._2._1.asInstanceOf[BDM[Double]] val b2 = t._2._2.asInstanceOf[BDM[Double]] val c = (b1 * b2).asInstanceOf[BDM[Double]] (new BlockID(t._1.row, t._1.column), c) })* Each node has 16 cores. However, no matter I set 16 tasks or more on each node, the concurrency cannot be higher than 60%, which means not every core on the node is computing. Then I check the running log on the WebUI, according to the amount of shuffle read and write in every task, I see some task do once matrix multiplication, some do twice while some do none. Thus, I think of using java multi thread to increase the concurrency. I wrote a program in scala which calls java multi thread without Spark on a single node, by watch the 'top' monitor, I find this program can use CPU up to 1500% ( means nearly every core are computing). But I have no idea how to use Java multi thread in RDD transformation. Is there any one can provide some example code to use Java multi thread in RDD transformation, or give any idea to increase the concurrency ? Thanks for all -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-use-multi-thread-in-RDD-map-function-tp8583.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org