Here's a bunch of configuration for that https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior
Thanks Best Regards On Fri, Jun 26, 2015 at 10:37 PM, igor.berman <igor.ber...@gmail.com> wrote: > Hi, > wanted to get some advice regarding tunning spark application > I see for some of the tasks many log entries like this > Executor task launch worker-38 ExternalAppendOnlyMap: Thread 239 spilling > in-memory map of 5.1 MB to disk (272 times so far) > (especially when inputs are considerable) > I understand that this is connected to shuffle and joins, so that data is > spilled into disk to prevent OOM errors > what is the approach to handle this situation, I mean how can I "fix" this > situation - increase parallelism? add memory to the cluster? what else? > any ideas would be welcome > > in general my app reads N key-value files and iteratevely fullOuterJoin-s > them(like folding by fullouter join). each key is user id and value is > aggregated statistics for this user represented by simple object. N files > are N days back. so to compute aggregation for today I can "combine" daily > aggregations. > thanks in advance, > Igor > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/spilling-in-memory-map-of-5-1-MB-to-disk-272-times-so-far-tp23509.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >