[ 
https://issues.apache.org/jira/browse/FLINK-15758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Zagrebin reassigned FLINK-15758:
---------------------------------------

    Assignee: Andrey Zagrebin

> Investigate potential out-of-memory problems due to managed unsafe memory 
> allocation
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-15758
>                 URL: https://issues.apache.org/jira/browse/FLINK-15758
>             Project: Flink
>          Issue Type: Task
>          Components: API / DataSet, Runtime / Task
>            Reporter: Andrey Zagrebin
>            Assignee: Andrey Zagrebin
>            Priority: Critical
>             Fix For: 1.11.0
>
>
> After FLINK-13985, managed memory is allocated from UNSAFE, not as direct nio 
> buffers as before 1.10.
> in FLINK-14894, there was an attempt to release this memory only when all 
> Java handles of the unsafe memory are about to be GC'ed. It is similar to how 
> it was with direct nio buffers before 1.10 but the unsafe memory is not 
> tracked by direct memory limit (-XX:MaxDirectMemorySize). The problem is that 
> over-allocating of unsafe memory will not hit the direct limit and will not 
> cause GC immediately which will be the only way to release it. In this case, 
> it causes out-of-memory failures w/o triggering GC to release a lot of 
> potentially already unused memory.
> We have to investigate further optimisations, like:
>  * directly monitoring phantom reference queue of the cleaner (if JVM detects 
> quickly that there are no more reference to the memory) and explicitly 
> release memory ready for GC asap, e.g. after Task exit
>  * monitor allocated memory amount and block allocation until GC releases 
> occupied memory instead of failing with out-of-memory immediately
> cc [~sewen] [~trohrmann]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to