[ 
https://issues.apache.org/jira/browse/FLINK-19852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231231#comment-17231231
 ] 

Andrey Zagrebin commented on FLINK-19852:
-----------------------------------------

UnsafeMemory usage has indeed become more safe after 1.11. We do not just 
expect users of UnsafeMemory to always explicitly release it. MemorySegments 
are tracked by JVM GC to make sure that they are reused only once no other code 
refers to them, basically when they are GC'ed but GC takes time, of course, 
this is the price for safety. It is very similar to JVM direct memory. The 
problem here is that the limit is relatively small per operator and it is exact 
(no playground to over-allocate). The usage pattern in TempBarrier is the worst 
for this safe approach because it tries to (re)-allocate all segments at once. 
Hence, it has to wait for GC of all segments between iterations (stop the world 
event). From what I see in SpillingBuffer/ListMemorySegmentSource it does not 
really need all segments at once, the segments are just pulled on-demand 
one-by-one. If ListMemorySegmentSource reserved segments also one-by-one then 
GC would be amortised between segments allocation.

> Managed memory released check can block IterativeTask
> -----------------------------------------------------
>
>                 Key: FLINK-19852
>                 URL: https://issues.apache.org/jira/browse/FLINK-19852
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Task
>    Affects Versions: 1.11.0, 1.10.2, 1.11.1, 1.11.2
>            Reporter: shaomeng.wang
>            Priority: Critical
>         Attachments: image-2020-10-28-17-48-28-395.png, 
> image-2020-10-28-17-48-48-583.png
>
>
> UnsafeMemoryBudget#reserveMemory, called on TempBarrier, needs time to wait 
> on GC of all allocated/released managed memory at every iteration.
>  
> stack:
> !image-2020-10-28-17-48-48-583.png!
> new TempBarrier in BatchTask
> !image-2020-10-28-17-48-28-395.png!
>  
> These will be very slow than before.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to