Hi, I am hitting OutOfMemoryError issues with spark executors. It happens mainly during shuffle. Executors gets killed with OutOfMemoryError. I have try setting up spark.executor.extraJavaOptions to take memory dump but its not happening.
spark.executor.extraJavaOptions = "-XX:+UseCompressedOops -XX:-HeapDumpOnOutOfMemoryError -*XX:OnOutOfMemoryError='kill -9 %p; jmap -heap %p > **/home/mycorp/npatel/jmap_%p*' - *XX:HeapDumpPath=/opt/cores/spark* -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails -Xloggc:/home/mycorp/npatel/insights-jobs/gclogs/gc_%p.log -XX:+PrintGCTimeStamps" Following is what I see repeatedly in yarn application logs after job fails. # java.lang.OutOfMemoryError: Java heap space # -XX:OnOutOfMemoryError=*"kill %p* *kill -9 %p; jmap -heap %p"* # Executing /bin/sh -c "kill 30434 kill -9 30434"... >From above logs it looks like spark executor by default have '-XX:OnOutOfMemoryError=kill %p' and then it incorrectly append my custom arguments. Following is linux process info for one particular executor which confirms above. mycorp 29113 29109 99 08:56 ? 04:13:46 /usr/java/jdk1.7.0_51/bin/java -Dorg.jboss.netty.epollBugWorkaround=true -server *-XX:OnOutOfMemoryError=kill %p *-Xms23000m -Xmx23000m -XX:+UseCompressedOops -XX:NewRatio=2 -XX:ConcGCThreads=2 -XX:ParallelGCThreads=2 -XX:-HeapDumpOnOutOfMemoryError *-XX:OnOutOfMemoryError=kill -9 %p;**jmap -heap %p > **/home/mycorp/npatel/jmap_%p* -XX:HeapDumpPath=/opt/cores/spark -XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails -Xloggc:/home/mycorp/npatel/gclogs/gc%p.log -XX:+PrintGCTimeStamps -Djava.io.tmpdir=/tmp/hadoop-mycorp/nm-local-dir/usercache/mycorp/appcache/application_1461196034441_24756/container_1461196034441_24756_01_000012/tmp -Dspark.driver.port=43095 -Dspark.akka.threads=32 -Dspark.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.5.1/logs/userlogs/application_1461196034441_24756/container_1461196034441_24756_01_000012 org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url akka.tcp://sparkDriver@10.250.70.116:43095/user/CoarseGrainedScheduler --executor-id 11 --hostname hdn1.mycorpcorporation.local --cores 6 --app-id application_1461196034441_24756 --user-class-path file:/tmp/hadoop-mycorp/nm-local-dir/usercache/mycorp/appcache/application_1461196034441_24756/container_1461196034441_24756_01_000012/__app__.jar Also tried taking dump of running executor using jmap -dump. but it fails with exception in middle of it. It still generate some dump if I used -F option. However that file seem corrupted and not getting load into eclipse MAT or VisualVM. So what is the correct way to set this executor opts and ultimately take executor memory dump? More specifically: 1) To take heap dump on particular location with application id and process id in file name 2) Put GC logs in particular location with application id and process id in file name. currently it does but with literal %p in a file name Thanks -- [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] <https://twitter.com/Xactly> [image: Facebook] <https://www.facebook.com/XactlyCorp> [image: YouTube] <http://www.youtube.com/xactlycorporation>