Hi Jason
LiveUI initializes ElementTrackingStore with InMemoryStore, so it has OOM risk.
/**
* Create an in-memory store for a live application.
*/
def createLiveStore(
conf: SparkConf,
appStatusSource: Option[AppStatusSource] = None): AppStatusStore = {
val store = new ElementTrackingStore(new InMemoryStore(), conf)
val listener = new AppStatusListener(store, conf, true, appStatusSource)
new AppStatusStore(store, listener = Some(listener))
}
In addition to the parameters you mentioned, you can try to reduce the
following parameters:
* spark.ui.retainedTasks
* spark.ui.dagGraph.retainedRootRDDs
If you have more information about this situation, it would be good.
Best
Qian
> 2022年8月3日 上午11:04,Jason Jun <[email protected]> 写道:
>
> He there,
>
> We have spark driver running 24x7, and we are continiously getting OOM in
> spark driver every 10 days.
> I found org.apache.spark.status.ElementTrackingStore keep 85% of heap usage
> after analyzing heap dump like this image:
> <image.png>
>
> i found these parameter would be the root cause in jira ticket,
> https://issues.apache.org/jira/browse/SPARK-26395
> <https://issues.apache.org/jira/browse/SPARK-26395>
> spark.ui.retainedDeadExecutors
> spark.ui.retainedJobs
> spark.ui.retainedStages
>
> But it didn't work. OOM is delayed from 1 week to 10 days with these changes.
>
> It would be really appreciated if anyone can give me any solutions.
>
> Thanks
> Jason
>
> .