[ https://issues.apache.org/jira/browse/FLINK-19852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231231#comment-17231231 ]
Andrey Zagrebin commented on FLINK-19852: ----------------------------------------- UnsafeMemory usage has indeed become more safe after 1.11. We do not just expect users of UnsafeMemory to always explicitly release it. MemorySegments are tracked by JVM GC to make sure that they are reused only once no other code refers to them, basically when they are GC'ed but GC takes time, of course, this is the price for safety. It is very similar to JVM direct memory. The problem here is that the limit is relatively small per operator and it is exact (no playground to over-allocate). The usage pattern in TempBarrier is the worst for this safe approach because it tries to (re)-allocate all segments at once. Hence, it has to wait for GC of all segments between iterations (stop the world event). From what I see in SpillingBuffer/ListMemorySegmentSource it does not really need all segments at once, the segments are just pulled on-demand one-by-one. If ListMemorySegmentSource reserved segments also one-by-one then GC would be amortised between segments allocation. > Managed memory released check can block IterativeTask > ----------------------------------------------------- > > Key: FLINK-19852 > URL: https://issues.apache.org/jira/browse/FLINK-19852 > Project: Flink > Issue Type: Bug > Components: Runtime / Task > Affects Versions: 1.11.0, 1.10.2, 1.11.1, 1.11.2 > Reporter: shaomeng.wang > Priority: Critical > Attachments: image-2020-10-28-17-48-28-395.png, > image-2020-10-28-17-48-48-583.png > > > UnsafeMemoryBudget#reserveMemory, called on TempBarrier, needs time to wait > on GC of all allocated/released managed memory at every iteration. > > stack: > !image-2020-10-28-17-48-48-583.png! > new TempBarrier in BatchTask > !image-2020-10-28-17-48-28-395.png! > > These will be very slow than before. -- This message was sent by Atlassian Jira (v8.3.4#803005)