[ https://issues.apache.org/jira/browse/FLINK-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15976355#comment-15976355 ]
ASF GitHub Bot commented on FLINK-6295: --------------------------------------- Github user zentol commented on the issue: https://github.com/apache/flink/pull/3709 @WangTaoTheTonic Doesn't matter that the job status is ever changing, we only care about the state at the time of the request. There are 2 cases to consider when accessing the cache for a given ID: **a) An EG was cached for the given ID** In this case we can check the state of the job via `AccessExceutionGraph#getState`. Modify the this block in `ExecutionGraphHolder` ``` if (cached != null) { return cached; } ``` to this ``` if (cached != null) { if (cached.getState() == JobStatus.SUSPENDED) { cache.remove(jid); } return cached; } ``` and you're done. **b) No EG was cached for the given ID** In this case the status doesn't matter, you ask the JM and if it returns an EG you add it to the cache. We don't care whether this EG is suspended because it will be removed with the next request that comes in. > use LoadingCache instead of WeakHashMap to lower latency > -------------------------------------------------------- > > Key: FLINK-6295 > URL: https://issues.apache.org/jira/browse/FLINK-6295 > Project: Flink > Issue Type: Bug > Components: Webfrontend > Reporter: Tao Wang > Assignee: Tao Wang > > Now in ExecutionGraphHolder, which is used in many handlers, we use a > WeakHashMap to cache ExecutionGraph(s), which is only sensitive to garbage > collection. > The latency is too high when JVM do GC rarely, which will make status of jobs > or its tasks unmatched with the real ones. > LoadingCache is a common used cache implementation from guava lib, we can use > its time based eviction to lower latency of status update. -- This message was sent by Atlassian JIRA (v6.3.15#6346)