Re: Long running Spark Streaming Job increasing executing time per batch

2015-09-24 Thread Jeremy Smith
I found a similar issue happens when there is a memory leak in the spark application (or, in my case, one of the libraries that's used in the spark application). Gradually, unclaimed objects make their way into old or permanent generation space, reducing the available heap. It causes GC overhead

Re: Long running Spark Streaming Job increasing executing time per batch

2014-06-20 Thread Tathagata Das
In the spark web ui, you should see the same pattern of stage repeating over time, as the same sequence of stages get computed in every batch. From that you would be able to get a sense of how much corresponding stages take across different batches, and which stage is actually is taking more time,

Re: Long running Spark Streaming Job increasing executing time per batch

2014-06-19 Thread Skogberg, Fredrik
Hi TD, >Thats quite odd. Yes, with checkpoint the lineage does not increase. Can you >tell which stage is the >processing of each batch is causing the increase in >the processing time? I haven’t been able to determine exactly what stage that is causing the increase in processing time. Any poi

Re: Long running Spark Streaming Job increasing executing time per batch

2014-06-19 Thread Tathagata Das
Thats quite odd. Yes, with checkpoint the lineage does not increase. Can you tell which stage is the processing of each batch is causing the increase in the processing time? Also, what is the batch interval, and checkpoint interval? TD On Thu, Jun 19, 2014 at 8:45 AM, Skogberg, Fredrik < fredri