Does anyone encounter the non-heap memory increasing issue when using RocksDB as state backend with memory controlling enabled? It has plagued us for long time. Our flink jobs are deployed under Kubernetes, and they would be killed because of OOM by the increasing non-heap memory by Kubernetes. Thought we can add the environment variable MALLOC_ARENA_MAX=1 for TaskManagers to avoid it, but that will reduce the throughput. In some scenarios, the throughput decreases 40%.
So we want to find out the answer of bellow questions, and further, any better solution. Anyone could help? Thanks in advance! 1、 Why savepoint or full checkpoints would lead to such glibc bug? Which methods calls trigger the bug? 2、 It doestn’t happen during savepoint or full checkpoints of all flink job, but some scenarios such as over aggregation windows which do hold large states. So anyone knows what certain scenarios or functions or operators will trigger such glibc bug? 3、 This troubleshooting has stated in the official documentation for several years, why nobody fixes the glibc bug or makes some effort from Flink side, such as using another memory allocator like jemelloc? Here is the troubleshooting link and description, FYI. https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_trouble/#container-memory-exceeded Container Memory Exceeded #<https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_trouble/#container-memory-exceeded> If a Flink container tries to allocate memory beyond its requested size (Yarn or Kubernetes), this usually indicates that Flink has not reserved enough native memory. You can observe this either by using an external monitoring system or from the error messages when a container gets killed by the deployment environment. If you encounter this problem in the JobManager process, you can also enable the JVM Direct Memory limit by setting the jobmanager.memory.enable-jvm-direct-memory-limit<https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#jobmanager-memory-enable-jvm-direct-memory-limit> option to exclude possible JVM Direct Memory leak. If RocksDBStateBackend<https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/state_backends/#the-rocksdbstatebackend> is used: * and memory controlling is disabled: You can try to increase the TaskManager’s managed memory<https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_setup/#managed-memory>. * and memory controlling is enabled and non-heap memory increases during savepoint or full checkpoints: This may happen due to the glibc memory allocator (see glibc bug<https://sourceware.org/bugzilla/show_bug.cgi?id=15321>). You can try to add the environment variable<https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#forwarding-environment-variables> MALLOC_ARENA_MAX=1 for TaskManagers. Alternatively, you can increase the JVM Overhead<https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_setup/#capped-fractionated-components>.