Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-18 Thread Tathagata Das
Glad to hear that. :) On Thu, Jun 18, 2015 at 6:25 AM, Ji ZHANG wrote: > Hi, > > We switched from ParallelGC to CMS, and the symptom is gone. > > On Thu, Jun 4, 2015 at 3:37 PM, Ji ZHANG wrote: > >> Hi, >> >> I set spark.shuffle.io.preferDirectBufs to false in SparkConf and this >> setting can

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-18 Thread Ji ZHANG
Hi, We switched from ParallelGC to CMS, and the symptom is gone. On Thu, Jun 4, 2015 at 3:37 PM, Ji ZHANG wrote: > Hi, > > I set spark.shuffle.io.preferDirectBufs to false in SparkConf and this > setting can be seen in web ui's environment tab. But, it still eats memory, > i.e. -Xmx set to 512M

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-04 Thread Ji ZHANG
Hi, I set spark.shuffle.io.preferDirectBufs to false in SparkConf and this setting can be seen in web ui's environment tab. But, it still eats memory, i.e. -Xmx set to 512M but RES grows to 1.5G in half a day. On Wed, Jun 3, 2015 at 12:02 PM, Shixiong Zhu wrote: > Could you set "spark.shuffle.

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-02 Thread Shixiong Zhu
Could you set "spark.shuffle.io.preferDirectBufs" to false to turn off the off-heap allocation of netty? Best Regards, Shixiong Zhu 2015-06-03 11:58 GMT+08:00 Ji ZHANG : > Hi, > > Thanks for you information. I'll give spark1.4 a try when it's released. > > On Wed, Jun 3, 2015 at 11:31 AM, Tathag

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-02 Thread Ji ZHANG
Hi, Thanks for you information. I'll give spark1.4 a try when it's released. On Wed, Jun 3, 2015 at 11:31 AM, Tathagata Das wrote: > Could you try it out with Spark 1.4 RC3? > > Also pinging, Cloudera folks, they may be aware of something. > > BTW, the way I have debugged memory leaks in the pa

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-02 Thread Tathagata Das
Could you try it out with Spark 1.4 RC3? Also pinging, Cloudera folks, they may be aware of something. BTW, the way I have debugged memory leaks in the past is as follows. Run with a small driver memory, say 1 GB. Periodically (maybe a script), take snapshots of histogram and also do memory dump

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-06-02 Thread Ji ZHANG
Hi, Thanks for you reply. Here's the top 30 entries of jmap -histo:live result: num #instances #bytes class name -- 1: 40802 145083848 [B 2: 99264 12716112 3: 99264 12291480 4:

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-28 Thread Ji ZHANG
Hi, Unfortunately, they're still growing, both driver and executors. I run the same job with local mode, everything is fine. On Thu, May 28, 2015 at 5:26 PM, Akhil Das wrote: > Can you replace your counting part with this? > > logs.filter(_.s_id > 0).foreachRDD(rdd => logger.info(rdd.count()))

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-28 Thread Akhil Das
Can you replace your counting part with this? logs.filter(_.s_id > 0).foreachRDD(rdd => logger.info(rdd.count())) Thanks Best Regards On Thu, May 28, 2015 at 1:02 PM, Ji ZHANG wrote: > Hi, > > I wrote a simple test job, it only does very basic operations. for example: > > val lines = Kaf

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-28 Thread Ji ZHANG
Hi, I wrote a simple test job, it only does very basic operations. for example: val lines = KafkaUtils.createStream(ssc, zkQuorum, group, Map(topic -> 1)).map(_._2) val logs = lines.flatMap { line => try { Some(parse(line).extract[Impression]) } catch { case _:

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-28 Thread Akhil Das
Hi Zhang, Could you paste your code in a gist? Not sure what you are doing inside the code to fill up memory. Thanks Best Regards On Thu, May 28, 2015 at 10:08 AM, Ji ZHANG wrote: > Hi, > > Yes, I'm using createStream, but the storageLevel param is by default > MEMORY_AND_DISK_SER_2. Besides,

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-27 Thread Ji ZHANG
Hi, Yes, I'm using createStream, but the storageLevel param is by default MEMORY_AND_DISK_SER_2. Besides, the driver's memory is also growing. I don't think Kafka messages will be cached in driver. On Thu, May 28, 2015 at 12:24 AM, Akhil Das wrote: > Are you using the createStream or createDir

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-27 Thread Akhil Das
Are you using the createStream or createDirectStream api? If its the former, you can try setting the StorageLevel to MEMORY_AND_DISK (it might slow things down though). Another way would be to try the later one. Thanks Best Regards On Wed, May 27, 2015 at 1:00 PM, Ji ZHANG wrote: > Hi Akhil, >

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-27 Thread Ji ZHANG
Hi Akhil, Thanks for your reply. Accoding to the Streaming tab of Web UI, the Processing Time is around 400ms, and there's no Scheduling Delay, so I suppose it's not the Kafka messages that eat up the off-heap memory. Or maybe it is, but how to tell? I googled about how to check the off-heap memo

Re: Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-27 Thread Akhil Das
After submitting the job, if you do a ps aux | grep spark-submit then you can see all JVM params. Are you using the highlevel consumer (receiver based) for receiving data from Kafka? In that case if your throughput is high and the processing delay exceeds batch interval then you will hit this memor

Spark Streming yarn-cluster Mode Off-heap Memory Is Constantly Growing

2015-05-26 Thread Ji ZHANG
Hi, I'm using Spark Streaming 1.3 on CDH5.1 with yarn-cluster mode. I find out that YARN is killing the driver and executor process because of excessive use of memory. Here's something I tried: 1. Xmx is set to 512M and the GC looks fine (one ygc per 10s), so the extra memory is not used by heap.