Hi all, I am trying to do some image analytics type workload using Spark. The images are read in JPEG format and then are converted to the raw format in map functions and this causes the size of the partitions to grow by an order of 1. In addition to this, I am caching some of the data because my pipeline is iterative.
I get OOM errors and GC overhead limit exceeded errors and I fix them by increasing the heap size or number of partitions even though after doing that there is still high GC pressure. I know that my partitions should be small enough such that it can fit in memory. But when I did the calculation using the size of cache partitions available in Spark UI I think the individual partitions are small enough given the heap size and storage fraction. I am interested in getting your input on what other things can cause OOM errors in executors. Is caching data can be a problem (SPARK-1777)? Thank you in advance. -Supun