Hi,
usually you can solve this by 2 steps
make rdd to have more partitions
play with shuffle memory fraction

in spark 1.6 cache vs shuffle memory fractions are adjusted automatically

On 5 February 2016 at 23:07, Corey Nolet <cjno...@gmail.com> wrote:

> I just recently had a discovery that my jobs were taking several hours to
> completely because of excess shuffle spills. What I found was that when I
> hit the high point where I didn't have enough memory for the shuffles to
> store all of their file consolidations at once, it could spill so many
> times that it causes my job's runtime to increase by orders of magnitude
> (and sometimes fail altogether).
>
> I've played with all the tuning parameters I can find. To speed the
> shuffles up, I tuned the akka threads to different values. I also tuned the
> shuffle buffering a tad (both up and down).
>
> I feel like I see a weak point here. The mappers are sharing memory space
> with reducers and the shuffles need enough memory to consolidate and pull
> otherwise they will need to spill and spill and spill. What i've noticed
> about my jobs is that this is a difference between them taking 30 minutes
> and 4 hours or more. Same job- just different memory tuning.
>
> I've found that, as a result of the spilling, I'm better off not caching
> any data in memory and lowering my storage fraction to 0 and still hoping I
> was able to give my shuffles enough memory that my data doesn't
> continuously spill. Is this the way it's supposed to be? It makes it hard
> because it seems like it forces the memory limits on my job- otherwise it
> could take orders of magnitude longer to execute.
>
>

Reply via email to