Hi all Users report they meet serious memory leak when submitting jobs continously in session mode within k8s (please refer to FLINK-18712[1] ), and I also reproduce this to find this is caused by memory fragmentation of glibc [2][3] and provide solutions to fix this:
* Quick but not very clean solution to limit the memory pool of glibc, limit MALLOC_ARENA_MAX to 2 * More general solution by rebuilding the image to install libjemalloc-dev and add the libjemalloc.so it to LD_PRELOAD The reporter adopted the 2nd solution to fix this issue eventually. Thus, I begin to think whether we should change our Dockerfile to adopt jemalloc as default memory allocator [4]. >From my point of view, we have two choices: 1. Introduce another Dockerfile using jemalloc as default memory allocator, which means Flink needs another two new image tags to build docker with jemalloc while default docker still use glibc. 2. Set the default memory allocator as jemalloc in our existing Dockerfiles, which means Flink offer docker image with jemalloc by default. I prefer the 2nd option as our company already use jemalloc as default memory allocator for JDK at our production environment due to messages from os team warning of glibc's memory fragmentation. Moreover, I found several open source projects adopting jemalloc as default memory allocator within their images to resolve memory fragmentation problem, e.g fluent [5], home-assistant [6]. What do you guys think of this issue? [1] https://issues.apache.org/jira/browse/FLINK-18712 [2] https://www.gnu.org/software/libc/manual/html_mono/libc.html#Freeing-after-Malloc [3] https://sourceware.org/bugzilla/show_bug.cgi?id=15321 [4] https://issues.apache.org/jira/browse/FLINK-19125 [5] https://docs.fluentbit.io/manual/v/1.0/installation/docker#why-there-is-no-fluent-bit-docker-image-based-on-alpine-linux [6] https://github.com/home-assistant/core/pull/33237 Best Yun Tang