[ https://issues.apache.org/jira/browse/FLINK-18646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161853#comment-17161853 ]
Andrey Zagrebin edited comment on FLINK-18646 at 7/22/20, 9:57 AM: ------------------------------------------------------------------- GC gaps are not always so not clear how much it slows things down. I agree the best is to try w/o always calling GC. I agree It makes sense to continue calling GC only if tryRunPendingCleaners is false. The original idea was to retry GC only after a lot of waiting {{RETRIGGER_GC_AFTER_SLEEPS.}} It was a further optimisation comparing to the Java code for direct memory allocation. was (Author: azagrebin): GC gaps are not always so not clear how much it slows things down. I agree the best is to try w/o always calling GC. It makes sense to continue calling GC if tryRunPendingCleaners is false. The original idea was to retry GC only after a lot of waiting {{RETRIGGER_GC_AFTER_SLEEPS.}} It was a further optimisation comparing to the Java code for direct memory allocation. > Managed memory released check can block RPC thread > -------------------------------------------------- > > Key: FLINK-18646 > URL: https://issues.apache.org/jira/browse/FLINK-18646 > Project: Flink > Issue Type: Bug > Components: Runtime / Task > Affects Versions: 1.11.0 > Reporter: Andrey Zagrebin > Priority: Critical > Fix For: 1.11.2 > > Attachments: log1.png, log2.png > > > UnsafeMemoryBudget#verifyEmpty, called on slot freeing, needs time to wait on > GC of all allocated/released managed memory. If there are a lot of segments > to GC then it can take time to finish the check. If slot freeing happens in > RPC thread, the GC waiting can block it and TM risks to miss its heartbeat. -- This message was sent by Atlassian Jira (v8.3.4#803005)