I found a similar issue happens when there is a memory leak in the spark application (or, in my case, one of the libraries that's used in the spark application). Gradually, unclaimed objects make their way into old or permanent generation space, reducing the available heap. It causes GC overhead to build up until the application is spending most of its time trying (and failing) to free up memory. In some cases, this eventually leads to an exception about GC pressure or Out of Memory. But (at least for me) that can take hours or days to happen.
I eventually identified the memory leak by running the spark app locally and attaching VisualVM (a free cross-platform tool; I don't want to post the link and be marked as spam, but you should be able to find it easily). Using the memory profiling tab, you can see how many instances of each class exist in memory. Letting the application run for a while, it is easy to see when some classes' instance count only increases forever, which in turn makes it easier to find the memory leak. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Long-running-Spark-Streaming-Job-increasing-executing-time-per-batch-tp7918p24804.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org