Hi, usually you can solve this by 2 steps make rdd to have more partitions play with shuffle memory fraction
in spark 1.6 cache vs shuffle memory fractions are adjusted automatically On 5 February 2016 at 23:07, Corey Nolet <cjno...@gmail.com> wrote: > I just recently had a discovery that my jobs were taking several hours to > completely because of excess shuffle spills. What I found was that when I > hit the high point where I didn't have enough memory for the shuffles to > store all of their file consolidations at once, it could spill so many > times that it causes my job's runtime to increase by orders of magnitude > (and sometimes fail altogether). > > I've played with all the tuning parameters I can find. To speed the > shuffles up, I tuned the akka threads to different values. I also tuned the > shuffle buffering a tad (both up and down). > > I feel like I see a weak point here. The mappers are sharing memory space > with reducers and the shuffles need enough memory to consolidate and pull > otherwise they will need to spill and spill and spill. What i've noticed > about my jobs is that this is a difference between them taking 30 minutes > and 4 hours or more. Same job- just different memory tuning. > > I've found that, as a result of the spilling, I'm better off not caching > any data in memory and lowering my storage fraction to 0 and still hoping I > was able to give my shuffles enough memory that my data doesn't > continuously spill. Is this the way it's supposed to be? It makes it hard > because it seems like it forces the memory limits on my job- otherwise it > could take orders of magnitude longer to execute. > >