I found a similar issue happens when there is a memory leak in the spark
application (or, in my case, one of the libraries that's used in the spark
application). Gradually, unclaimed objects make their way into old or
permanent generation space, reducing the available heap. It causes GC
overhead
In the spark web ui, you should see the same pattern of stage repeating
over time, as the same sequence of stages get computed in every batch. From
that you would be able to get a sense of how much corresponding stages take
across different batches, and which stage is actually is taking more time,
Hi TD,
>Thats quite odd. Yes, with checkpoint the lineage does not increase. Can you
>tell which stage is the >processing of each batch is causing the increase in
>the processing time?
I haven’t been able to determine exactly what stage that is causing the
increase in processing time. Any poi
Thats quite odd. Yes, with checkpoint the lineage does not increase. Can
you tell which stage is the processing of each batch is causing the
increase in the processing time?
Also, what is the batch interval, and checkpoint interval?
TD
On Thu, Jun 19, 2014 at 8:45 AM, Skogberg, Fredrik <
fredri