Re: spark performance non-linear response

Yadid Ayzenberg Wed, 07 Oct 2015 08:43:31 -0700

Additional missing relevant information:

Im running a transformation, there are no Shuffles occurring and at theend im performing a lookup of 4 partitions on the driver.




On 10/7/15 11:26 AM, Yadid Ayzenberg wrote:

Hi All,
Im using spark 1.4.1 to to analyze a largish data set (severalGigabytes of data). The RDD is partitioned into 2048 partitions whichare more or less equal and entirely cached in RAM.I evaluated the performance on several cluster sizes, and amwitnessing a non linear (power) performance improvement as the clustersize increases (plot below). Each node has 4 cores and each worker isconfigured to use 10GB or RAM.
Spark performance
I would expect a more linear response given the number of partitionsand the fact that all of the data is cached.Can anyone suggest what I should tweak in order to improve theperformance?
Or perhaps provide an explanation as to the behavior Im witnessing?

Yadid

Re: spark performance non-linear response

Reply via email to