[ 
https://issues.apache.org/jira/browse/FLINK-18646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17161853#comment-17161853
 ] 

Andrey Zagrebin edited comment on FLINK-18646 at 7/22/20, 9:57 AM:
-------------------------------------------------------------------

GC gaps are not always so not clear how much it slows things down. I agree the 
best is to try w/o always calling GC.
I agree It makes sense to continue calling GC only if tryRunPendingCleaners is 
false.
 The original idea was to retry GC only after a lot of waiting 
{{RETRIGGER_GC_AFTER_SLEEPS.}}
 It was a further optimisation comparing to the Java code for direct memory 
allocation.


was (Author: azagrebin):
GC gaps are not always so not clear how much it slows things down. I agree the 
best is to try w/o always calling GC.
 It makes sense to continue calling GC if tryRunPendingCleaners is false.
 The original idea was to retry GC only after a lot of waiting 
{{RETRIGGER_GC_AFTER_SLEEPS.}}
 It was a further optimisation comparing to the Java code for direct memory 
allocation.

> Managed memory released check can block RPC thread
> --------------------------------------------------
>
>                 Key: FLINK-18646
>                 URL: https://issues.apache.org/jira/browse/FLINK-18646
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>    Affects Versions: 1.11.0
>            Reporter: Andrey Zagrebin
>            Priority: Critical
>             Fix For: 1.11.2
>
>         Attachments: log1.png, log2.png
>
>
> UnsafeMemoryBudget#verifyEmpty, called on slot freeing, needs time to wait on 
> GC of all allocated/released managed memory. If there are a lot of segments 
> to GC then it can take time to finish the check. If slot freeing happens in 
> RPC thread, the GC waiting can block it and TM risks to miss its heartbeat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to