Spark off heap memory leak on Yarn with Kafka direct stream

Apoorva Sareen Mon, 13 Jul 2015 11:07:10 -0700

Hi,

I am running spark streaming 1.4.0 on Yarn (Apache distribution 2.6.0) with 
java 1.8.0_45 and also Kafka direct stream. I am also using spark with scala 
2.11 support.


The issue I am seeing is that both driver and executor containers are gradually 
increasing the physical memory usage till a point where yarn container kill it. 
I have configured upto 192M Heap and 384 off heap space in my driver but it 
eventually runs out of it

The Heap memory appears to be fine with regular GC cycles. There is no 
OutOffMemory encountered ever in any such runs

Infact I am not generating any traffic on the kafka queues still this happens. 
Here is the code I am using

object SimpleSparkStreaming extends App {

val conf = new SparkConf()
val ssc = new 
StreamingContext(conf,Seconds(conf.getLong("spark.batch.window.size",1L)));
ssc.checkpoint("checkpoint")
val topics = Set(conf.get("spark.kafka.topic.name")); 
    val kafkaParams = Map[String, String]("metadata.broker.list" -> 
conf.get("spark.kafka.broker.list"))
            val kafkaStream = 
KafkaUtils.createDirectStream[String,String,StringDecoder,StringDecoder](ssc, 
kafkaParams, topics)
            kafkaStream.foreachRDD(rdd => {
                rdd.foreach(x => {
                    println(x._2)
                })

            })
    kafkaStream.print()
            ssc.start() 

            ssc.awaitTermination()

}
I am running this on CentOS 7. The command used for spark submit is following

./bin/spark-submit --class com.rasa.cloud.prototype.spark.SimpleSparkStreaming \
--conf spark.yarn.executor.memoryOverhead=256 \
--conf spark.yarn.driver.memoryOverhead=384 \
--conf spark.kafka.topic.name=test \
--conf spark.kafka.broker.list=172.31.45.218:9092 \
--conf spark.batch.window.size=1 \
--conf spark.app.name="Simple Spark Kafka application" \
--master yarn-cluster \
--num-executors 1 \
--driver-memory 192m \
--executor-memory 128m \
--executor-cores 1 \
/home/centos/spark-poc/target/lib/spark-streaming-prototype-0.0.1-SNAPSHOT.jar 
Any help is greatly appreciated

Regards,

Apoorva

Spark off heap memory leak on Yarn with Kafka direct stream

Reply via email to