OK, next question then is: if this is wall-clock time for the whole process, then, I wonder if you are just measuring the time taken by the longest single task. I'd expect the time taken by the longest straggler task to follow a distribution like this. That is, how balanced are the partitions?
Are you running so many executors that nodes are bottlenecking on CPU, or swapping? On Wed, Oct 7, 2015 at 4:42 PM, Yadid Ayzenberg <ya...@media.mit.edu> wrote: > Additional missing relevant information: > > Im running a transformation, there are no Shuffles occurring and at the > end im performing a lookup of 4 partitions on the driver. > > > > > On 10/7/15 11:26 AM, Yadid Ayzenberg wrote: > > Hi All, > > Im using spark 1.4.1 to to analyze a largish data set (several Gigabytes > of data). The RDD is partitioned into 2048 partitions which are more or > less equal and entirely cached in RAM. > I evaluated the performance on several cluster sizes, and am witnessing a > non linear (power) performance improvement as the cluster size increases > (plot below). Each node has 4 cores and each worker is configured to use > 10GB or RAM. > > [image: Spark performance] > > I would expect a more linear response given the number of partitions and > the fact that all of the data is cached. > Can anyone suggest what I should tweak in order to improve the performance? > Or perhaps provide an explanation as to the behavior Im witnessing? > > Yadid > > >