[ https://issues.apache.org/jira/browse/FLINK-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15972943#comment-15972943 ]
ASF GitHub Bot commented on FLINK-6295: --------------------------------------- Github user zentol commented on the issue: https://github.com/apache/flink/pull/3709 eh... in charge? Whenever *anything* related to a job is requested from the web-ui the EGHolder is accessed. Suppose you have the job info page (/jobs/:jobid) open in a browser or smth. The WebUI periodically sends requests to the backend, which will asks the EGHolder, which then asks the JM if it doesn't find the job in the cache. Now, if we remove the suspended EG we will in fact keep polling the JM until the job was recovered. This is actually the same behavior that you would have if the job is suspended and the GC/guava cache starts right away rr if the job was resumed on another JM but you aren't refreshing the webUI (which should redirect to the current leader). So for adding entries nothing changes; for removing entries the GC is still mostly in charge; we're just adding a small 2-line branch to invalidate suspended ExecutionGraphs that is activated if a handler accesses the EGHolder. > use LoadingCache instead of WeakHashMap to lower latency > -------------------------------------------------------- > > Key: FLINK-6295 > URL: https://issues.apache.org/jira/browse/FLINK-6295 > Project: Flink > Issue Type: Bug > Components: Webfrontend > Reporter: Tao Wang > Assignee: Tao Wang > > Now in ExecutionGraphHolder, which is used in many handlers, we use a > WeakHashMap to cache ExecutionGraph(s), which is only sensitive to garbage > collection. > The latency is too high when JVM do GC rarely, which will make status of jobs > or its tasks unmatched with the real ones. > LoadingCache is a common used cache implementation from guava lib, we can use > its time based eviction to lower latency of status update. -- This message was sent by Atlassian JIRA (v6.3.15#6346)