Hi all

Users report they meet serious memory leak when submitting jobs continously in 
session mode within k8s (please refer to FLINK-18712[1] ), and I also reproduce 
this to find this is caused by memory fragmentation of glibc [2][3] and provide 
solutions to fix this:

  *   Quick but not very clean solution to limit the memory pool of glibc, 
limit MALLOC_ARENA_MAX to 2

  *   More general solution by rebuilding the image to install libjemalloc-dev 
and add the libjemalloc.so it to LD_PRELOAD

The reporter adopted the 2nd solution to fix this issue eventually. Thus, I 
begin to think whether we should change our Dockerfile to adopt jemalloc as 
default memory allocator [4].
>From my point of view, we have two choices:

  1.  Introduce another Dockerfile using jemalloc as default memory allocator, 
which means Flink needs another two new image tags to build docker with 
jemalloc while default docker still use glibc.
  2.  Set the default memory allocator as jemalloc in our existing Dockerfiles, 
which means Flink offer docker image with jemalloc by default.

I prefer the 2nd option as our company already use jemalloc as default memory 
allocator for JDK at our production environment due to messages from os team 
warning of glibc's memory fragmentation.
Moreover, I found several open source projects adopting jemalloc as default 
memory allocator within their images to resolve memory fragmentation problem, 
e.g fluent [5], home-assistant [6].

What do you guys think of this issue?

[1] https://issues.apache.org/jira/browse/FLINK-18712
[2] 
https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=15321
[4] https://issues.apache.org/jira/browse/FLINK-19125
[5] 
https://docs.fluentbit.io/manual/v/1.0/installation/docker#why-there-is-no-fluent-bit-docker-image-based-on-alpine-linux
[6] https://github.com/home-assistant/core/pull/33237


Best
Yun Tang

Reply via email to