Hi,

I am hitting OutOfMemoryError issues with spark executors. It happens
mainly during shuffle. Executors gets killed with OutOfMemoryError. I have
try setting up spark.executor.extraJavaOptions to take memory dump but its
not happening.

spark.executor.extraJavaOptions = "-XX:+UseCompressedOops
-XX:-HeapDumpOnOutOfMemoryError -*XX:OnOutOfMemoryError='kill -9 %p; jmap
-heap %p > **/home/mycorp/npatel/jmap_%p*' -
*XX:HeapDumpPath=/opt/cores/spark* -XX:+UseG1GC -verbose:gc
-XX:+PrintGCDetails
-Xloggc:/home/mycorp/npatel/insights-jobs/gclogs/gc_%p.log
-XX:+PrintGCTimeStamps"

Following is what I see repeatedly in yarn application logs after job fails.

# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError=*"kill %p*
*kill -9 %p; jmap -heap %p"*
#   Executing /bin/sh -c "kill 30434
kill -9 30434"...

>From above logs it looks like spark executor by default have
'-XX:OnOutOfMemoryError=kill %p' and then it incorrectly append my custom
arguments.


Following is linux process info for one particular executor which confirms
above.

mycorp   29113 29109 99 08:56 ?        04:13:46
/usr/java/jdk1.7.0_51/bin/java -Dorg.jboss.netty.epollBugWorkaround=true
-server *-XX:OnOutOfMemoryError=kill %p *-Xms23000m -Xmx23000m
-XX:+UseCompressedOops -XX:NewRatio=2 -XX:ConcGCThreads=2
-XX:ParallelGCThreads=2 -XX:-HeapDumpOnOutOfMemoryError
*-XX:OnOutOfMemoryError=kill
-9 %p;**jmap -heap %p > **/home/mycorp/npatel/jmap_%p*
-XX:HeapDumpPath=/opt/cores/spark
-XX:+UseG1GC -verbose:gc -XX:+PrintGCDetails
-Xloggc:/home/mycorp/npatel/gclogs/gc%p.log -XX:+PrintGCTimeStamps
-Djava.io.tmpdir=/tmp/hadoop-mycorp/nm-local-dir/usercache/mycorp/appcache/application_1461196034441_24756/container_1461196034441_24756_01_000012/tmp
-Dspark.driver.port=43095 -Dspark.akka.threads=32
-Dspark.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.5.1/logs/userlogs/application_1461196034441_24756/container_1461196034441_24756_01_000012
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
akka.tcp://sparkDriver@10.250.70.116:43095/user/CoarseGrainedScheduler
--executor-id 11 --hostname hdn1.mycorpcorporation.local --cores 6 --app-id
application_1461196034441_24756 --user-class-path
file:/tmp/hadoop-mycorp/nm-local-dir/usercache/mycorp/appcache/application_1461196034441_24756/container_1461196034441_24756_01_000012/__app__.jar


Also tried taking dump of running executor using jmap -dump. but it fails
with exception in middle of it. It still generate some dump if I used -F
option. However that file seem corrupted and not getting load into eclipse
MAT or VisualVM.


So what is the correct way to set this executor opts and ultimately take
executor memory dump?

More specifically:

1) To take heap dump on particular location with application id and process
id in file name
2) Put GC logs in particular location with application id and process id in
file name. currently it does but with literal %p in a file name

Thanks

-- 


[image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>

<https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn] 
<https://www.linkedin.com/company/xactly-corporation>  [image: Twitter] 
<https://twitter.com/Xactly>  [image: Facebook] 
<https://www.facebook.com/XactlyCorp>  [image: YouTube] 
<http://www.youtube.com/xactlycorporation>

Reply via email to