spark performance non-linear response

Yadid Ayzenberg Wed, 07 Oct 2015 08:28:55 -0700

Hi All,

Im using spark 1.4.1 to to analyze a largish data set (several Gigabytesof data). The RDD is partitioned into 2048 partitions which are more orless equal and entirely cached in RAM.I evaluated the performance on several cluster sizes, and am witnessinga non linear (power) performance improvement as the cluster sizeincreases (plot below). Each node has 4 cores and each worker isconfigured to use 10GB or RAM.


Spark performance

I would expect a more linear response given the number of partitions andthe fact that all of the data is cached.

Can anyone suggest what I should tweak in order to improve the performance?
Or perhaps provide an explanation as to the behavior Im witnessing?

Yadid

spark performance non-linear response

Reply via email to