Hi All,
We have a spark streaming v1.4/java 8 application that slows down and
eventually runs out of heap space. The less driver memory, the faster it
happens.
Appended is our spark configuration and a snapshot of the of heap taken
using jmap on the driver process. The RDDInfo, $colon$colon and [C objects
keep growing as we observe. We also tried to use G1GC, but it acts the same.
Our dependency graph contains multiple updateStateByKey() calls. For each,
we explicitly set the checkpoint interval to 240 seconds.
We have our batch interval set to 15 seconds; with no delays at the start of
the process.
Spark configuration (Spark Driver Memory: 6GB, Spark Executor Memory: 2GB):
spark.streaming.minRememberDuration=180s
spark.ui.showConsoleProgress=false
spark.streaming.receiver.writeAheadLog.enable=true
spark.streaming.unpersist=true
spark.streaming.stopGracefullyOnShutdown=true
spark.streaming.ui.retainedBatches=10
spark.ui.retainedJobs=10
spark.ui.retainedStages=10
spark.worker.ui.retainedExecutors=10
spark.worker.ui.retainedDrivers=10
spark.sql.ui.retainedExecutions=10
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max=128m
num #instances #bytes class name
--
1: 8828200 565004800 org.apache.spark.storage.RDDInfo
2: 20794893 499077432 scala.collection.immutable.$colon$colon
3: 9646097 459928736 [C
4: 9644398 231465552 java.lang.String
5: 12760625 20417 java.lang.Integer
6: 21326 98632 [B
7:556959 44661232 [Lscala.collection.mutable.HashEntry;
8: 1179788 37753216
java.util.concurrent.ConcurrentHashMap$Node
9: 1169264 37416448 java.util.Hashtable$Entry
10:552707 30951592 org.apache.spark.scheduler.StageInfo
11:367107 23084712 [Ljava.lang.Object;
12:556948 22277920 scala.collection.mutable.HashMap
13: 2787 22145568
[Ljava.util.concurrent.ConcurrentHashMap$Node;
14:116997 12167688 org.apache.spark.executor.TaskMetrics
15:3604258650200
java.util.concurrent.LinkedBlockingQueue$Node
16:3604178650008
org.apache.spark.deploy.history.yarn.HandleSparkEvent
17: 83328478088 [Ljava.util.Hashtable$Entry;
18:3510618425464 scala.collection.mutable.ArrayBuffer
19:1169638421336 org.apache.spark.scheduler.TaskInfo
20:4461367138176 scala.Some
21:2119685087232
io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
22:1169634678520
org.apache.spark.scheduler.SparkListenerTaskEnd
23:1076794307160
org.apache.spark.executor.ShuffleWriteMetrics
24: 721624041072
org.apache.spark.executor.ShuffleReadMetrics
25:1172233751136 scala.collection.mutable.ListBuffer
26: 814733258920 org.apache.spark.executor.InputMetrics
27:1259033021672 org.apache.spark.rdd.RDDOperationScope
28: 914552926560 java.util.HashMap$Node
29:892917776
[Lscala.concurrent.forkjoin.ForkJoinTask;
30:1169572806968
org.apache.spark.scheduler.SparkListenerTaskStart
31: 21222188568 [Lorg.apache.spark.scheduler.StageInfo;
32: 164111819816 java.lang.Class
33: 878621405792
org.apache.spark.scheduler.SparkListenerUnpersistRDD
34: 22915 916600 org.apache.spark.storage.BlockStatus
35: 5887 895568 [Ljava.util.HashMap$Node;
36: 480 82
[Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry;
37: 7569 834968 [I
38: 9626 770080 org.apache.spark.rdd.MapPartitionsRDD
39: 31748 761952 java.lang.Long
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-heap-space-out-of-memory-tp27050.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org