Hi Raj, Since the number of executor cores is equivalent to the number of tasks that can be executed in parallel in the executor, in effect, the 6G executor memory configured for an executor is being shared by 6 tasks plus factoring in the memory allocation for caching & task execution. I would suggest increasing the executor-memory and also adjusting it if you're going to increase the number of executor cores.
You might also want to adjust the memory allocation for caching and task execution, via the spark.storage.memoryFraction config. By default, it's configured to 0.6 (60% of the memory is allocated for the cache). Lowering it to a smaller fraction, say 0.4 or 0.3, would give you more available memory for task executions. Hope this helps! Thanks, Deng On Tue, Jun 16, 2015 at 3:09 AM, diplomatic Guru <[email protected]> wrote: > Hello All, > > > I have a Spark job that throws "java.lang.OutOfMemoryError: GC overhead > limit exceeded". > > The job is trying to process a filesize 4.5G. > > I've tried following spark configuration: > > --num-executors 6 --executor-memory 6G --executor-cores 6 --driver-memory 3G > > I tried increasing more cores and executors which sometime works, but > takes over 20 minutes to process the file. > > Could I do something to improve the performance? or stop the Java Heap > issue? > > > Thank you. > > > Best regards, > > > Raj. >
