I am running a spark application in YARN having 2 executors with Xms/Xmx as
32 Gigs and spark.yarn.excutor.memoryOverhead as 6 gigs.

I am seeing that the app's physical memory is ever increasing and finally
gets killed by node manager

2015-07-25 15:07:05,354 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Container [pid=10508,containerID=container_1437828324746_0002_01_000003] is
running beyond physical memory limits. Current usage: 38.0 GB of 38 GB
physical memory used; 39.5 GB of 152 GB virtual memory used. Killing
container.
Dump of the process-tree for container_1437828324746_0002_01_000003 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 10508 9563 10508 10508 (bash) 0 0 9433088 314 /bin/bash -c
/usr/java/default/bin/java -server -XX:OnOutOfMemoryError='kill %p'
-Xms32768m -Xmx32768m  -Dlog4j.configuration=log4j-executor.properties
-XX:MetaspaceSize=512m -XX:+UseG1GC -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:gc.log
-XX:AdaptiveSizePolicyOutputInterval=1  -XX:+UseGCLogFileRotation
-XX:GCLogFileSize=500M -XX:NumberOfGCLogFiles=1
-XX:MaxDirectMemorySize=3500M -XX:NewRatio=3 -Dcom.sun.management.jmxremote
-Dcom.sun.management.jmxremote.port=36082
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false -XX:NativeMemoryTracking=detail
-XX:ReservedCodeCacheSize=100M -XX:MaxMetaspaceSize=512m
-XX:CompressedClassSpaceSize=256m
-Djava.io.tmpdir=/data/yarn/datanode/nm-local-dir/usercache/admin/appcache/application_1437828324746_0002/container_1437828324746_0002_01_000003/tmp
'-Dspark.driver.port=43354'
-Dspark.yarn.app.container.log.dir=/opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_000003
org.apache.spark.executor.CoarseGrainedExecutorBackend
akka.tcp://sparkDriver@nn1:43354/user/CoarseGrainedScheduler 1 dn3 6
application_1437828324746_0002 1>
/opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_000003/stdout
2>
/opt/hadoop/logs/userlogs/application_1437828324746_0002/container_1437828324746_0002_01_000003/stderr


I diabled YARN's parameter "yarn.nodemanager.pmem-check-enabled" and noticed
that physical memory usage went till 40 gigs

I checked the total RSS in /proc/pid/smaps and it was same value as physical
memory reported by Yarn and seen in top command.

I checked that its not a problem with the heap but something is increasing
in off heap/ native memory. I used tools like Visual VM but didn't find
anything that's increasing there. MaxDirectMmeory also didn't exceed 600MB.
Peak number of active threads was 70-80 and thread stack size didn't exceed
100MB. MetaspaceSize was around 60-70MB.

FYI, I am on Spark 1.2 and Hadoop 2.4.0 and my spark application is based on
Spark SQL and it's an HDFS read/write intensive application and caches data
in Spark SQL's in-memory caching

Any help would be highly appreciated. Or any hint that where should I look
to debug memory leak or if any tool already there. Let me know if any other
information is needed.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Ever-increasing-physical-memory-for-a-Spark-Application-in-YARN-tp13446.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to