Spark 2.2.0 GC Overhead Limit Exceeded and OOM errors in the executors

Supun Nakandala Fri, 27 Oct 2017 18:05:30 -0700

Hi all,

I am trying to do some image analytics type workload using Spark. The
images are read in JPEG format and then are converted to the raw format in
map functions and this causes the size of the partitions to grow by an
order of 1. In addition to this, I am caching some of the data because my
pipeline is iterative.


I get OOM errors and GC overhead limit exceeded errors and I fix them by
increasing the heap size or number of partitions even though after doing
that there is still high GC pressure.

I know that my partitions should be small enough such that it can fit in
memory. But when I did the calculation using the size of cache partitions
available in Spark UI I think the individual partitions are small enough
given the heap size and storage fraction. I am interested in getting your
input on what other things can cause OOM errors in executors. Is caching
data can be a problem (SPARK-1777)?

Thank you in advance.
-Supun

Spark 2.2.0 GC Overhead Limit Exceeded and OOM errors in the executors

Reply via email to