[
https://issues.apache.org/jira/browse/FLINK-4094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403533#comment-15403533
]
ramkrishna.s.vasudevan commented on FLINK-4094:
-----------------------------------------------
bq.So, another option to fix this would be to set the MaxDirectMemorySize
parameter properly.
Yes. I agree. But when the job runs in a multi tenant system where there are
other process running and they are also memory intensive configuring this may
always not be easy. I agree it is a direct way to solve the problem if one
really knows his memory needs and requirements.
Regarding Pooling, some techniques that can be followed ( am saying from the we
have used it in our projects)
-> Just pool the offheap byte buffers (all are fixed sized buffers). Once the
usage is over put them back to pool. If the pool is empty we need to wait
(blocking call - which may not be accepted). So either create onheap buffers
which may not be right in this use case (but it is ideally safe). Or allocate
offheap buffers dynamically and warn the user that his pool size has to be
increased because he is frequently allocating dynamic offheap buffers.
-> Another way to avoid segementation could be like Chunking. I can see that by
default we create 32K sized buffers (page size). Instead we could create say
2MB sized offheap buffers and keep allocating 32K sized offset on every
request. Again all the 2MB sized buffers will be pooled but once a buffer is
requested from the pool we try to allocate 32K offsets. Once a buffer is full
or the next request cannot be contained in it then move on to the next buffer.
In turn we can pool these chunks also so that once a chunk is done we put them
back to a chunk pool and reuse it once that portion of the chunk is done. But
this needs some knowledge of when the task has exactly completed the usage of
that chunk. There should not be any references to it.
> Off heap memory deallocation might not properly work
> ----------------------------------------------------
>
> Key: FLINK-4094
> URL: https://issues.apache.org/jira/browse/FLINK-4094
> Project: Flink
> Issue Type: Bug
> Components: Local Runtime
> Affects Versions: 1.1.0
> Reporter: Till Rohrmann
> Assignee: ramkrishna.s.vasudevan
> Priority: Critical
> Fix For: 1.1.0
>
>
> A user reported that off-heap memory is not properly deallocated when setting
> {{taskmanager.memory.preallocate:false}} (per default) [1]. This can cause
> the TaskManager process being killed by the OS.
> It should be possible to execute multiple batch jobs with preallocation
> turned off. No longer used direct memory buffers should be properly garbage
> collected so that the JVM process does not exceed it's maximum memory bounds.
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/offheap-memory-allocation-and-memory-leak-bug-td12154.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)