Spark Streaming heap space out of memory

2016-05-30 Thread christian.dancu...@rbc.com
Hi All,

We have a spark streaming v1.4/java 8 application that slows down and
eventually runs out of heap space. The less driver memory, the faster it
happens.

Appended is our spark configuration and a snapshot of the of heap taken
using jmap on the driver process. The RDDInfo, $colon$colon and [C objects
keep growing as we observe. We also tried to use G1GC, but it acts the same.

Our dependency graph contains multiple updateStateByKey() calls. For each,
we explicitly set the checkpoint interval to 240 seconds.

We have our batch interval set to 15 seconds; with no delays at the start of
the process.

Spark configuration (Spark Driver Memory: 6GB, Spark Executor Memory: 2GB):
spark.streaming.minRememberDuration=180s
spark.ui.showConsoleProgress=false
spark.streaming.receiver.writeAheadLog.enable=true
spark.streaming.unpersist=true
spark.streaming.stopGracefullyOnShutdown=true
spark.streaming.ui.retainedBatches=10
spark.ui.retainedJobs=10
spark.ui.retainedStages=10
spark.worker.ui.retainedExecutors=10
spark.worker.ui.retainedDrivers=10
spark.sql.ui.retainedExecutions=10
spark.serializer=org.apache.spark.serializer.KryoSerializer
spark.kryoserializer.buffer.max=128m

num #instances #bytes  class name
--
   1:   8828200  565004800  org.apache.spark.storage.RDDInfo
   2:  20794893  499077432  scala.collection.immutable.$colon$colon
   3:   9646097  459928736  [C
   4:   9644398  231465552  java.lang.String
   5:  12760625  20417  java.lang.Integer
   6: 21326  98632  [B
   7:556959   44661232  [Lscala.collection.mutable.HashEntry;
   8:   1179788   37753216 
java.util.concurrent.ConcurrentHashMap$Node
   9:   1169264   37416448  java.util.Hashtable$Entry
  10:552707   30951592  org.apache.spark.scheduler.StageInfo
  11:367107   23084712  [Ljava.lang.Object;
  12:556948   22277920  scala.collection.mutable.HashMap
  13:  2787   22145568 
[Ljava.util.concurrent.ConcurrentHashMap$Node;
  14:116997   12167688  org.apache.spark.executor.TaskMetrics
  15:3604258650200 
java.util.concurrent.LinkedBlockingQueue$Node
  16:3604178650008 
org.apache.spark.deploy.history.yarn.HandleSparkEvent
  17:  83328478088  [Ljava.util.Hashtable$Entry;
  18:3510618425464  scala.collection.mutable.ArrayBuffer
  19:1169638421336  org.apache.spark.scheduler.TaskInfo
  20:4461367138176  scala.Some
  21:2119685087232 
io.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry
  22:1169634678520 
org.apache.spark.scheduler.SparkListenerTaskEnd
  23:1076794307160 
org.apache.spark.executor.ShuffleWriteMetrics
  24: 721624041072 
org.apache.spark.executor.ShuffleReadMetrics
  25:1172233751136  scala.collection.mutable.ListBuffer
  26: 814733258920  org.apache.spark.executor.InputMetrics
  27:1259033021672  org.apache.spark.rdd.RDDOperationScope
  28: 914552926560  java.util.HashMap$Node
  29:892917776 
[Lscala.concurrent.forkjoin.ForkJoinTask;
  30:1169572806968 
org.apache.spark.scheduler.SparkListenerTaskStart
  31:  21222188568  [Lorg.apache.spark.scheduler.StageInfo;
  32: 164111819816  java.lang.Class
  33: 878621405792 
org.apache.spark.scheduler.SparkListenerUnpersistRDD
  34: 22915 916600  org.apache.spark.storage.BlockStatus
  35:  5887 895568  [Ljava.util.HashMap$Node;
  36:   480 82 
[Lio.netty.buffer.PoolThreadCache$MemoryRegionCache$Entry;
  37:  7569 834968  [I
  38:  9626 770080  org.apache.spark.rdd.MapPartitionsRDD
  39: 31748 761952  java.lang.Long




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-heap-space-out-of-memory-tp27050.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Processing Time Spikes (Spark Streaming)

2016-06-09 Thread christian.dancu...@rbc.com
What version of Spark are you running? 

Do you see the heap space slowly increase over time?

Have you set the ttl cleaner?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Processing-Time-Spikes-Spark-Streaming-tp22375p27130.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Spark Streaming heap space out of memory

2016-06-09 Thread christian.dancu...@rbc.com
Issue was resolved by upgrading Spark to version 1.6



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-heap-space-out-of-memory-tp27050p27131.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org