Here's a bunch of configuration for that
https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior

Thanks
Best Regards

On Fri, Jun 26, 2015 at 10:37 PM, igor.berman <igor.ber...@gmail.com> wrote:

> Hi,
> wanted to get some advice regarding tunning spark application
> I see for some of the tasks many log entries like this
> Executor task launch worker-38 ExternalAppendOnlyMap: Thread 239 spilling
> in-memory map of 5.1 MB to disk (272 times so far)
> (especially when inputs are considerable)
> I understand that this is connected to shuffle and joins, so that data is
> spilled into disk to prevent OOM errors
> what is the approach to handle this situation, I mean how can I "fix" this
> situation - increase parallelism? add memory to the cluster? what else?
> any ideas would be welcome
>
> in general my app reads N key-value files and iteratevely fullOuterJoin-s
> them(like folding by fullouter join). each key is user id and value is
> aggregated statistics for this user represented by simple object. N files
> are N days back. so to compute aggregation for today I can "combine" daily
> aggregations.
> thanks in advance,
> Igor
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/spilling-in-memory-map-of-5-1-MB-to-disk-272-times-so-far-tp23509.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to