Anyone encounters the non-heap memory increasing issue when using RocksDB as state backend and has better solutions?

Sue Alen Thu, 28 Dec 2023 23:35:48 -0800

Does anyone encounter the non-heap memory increasing issue when using RocksDB 
as state backend with memory controlling enabled? It has plagued us for long 
time. Our flink jobs are deployed under Kubernetes, and they would be killed 
because of OOM by the increasing non-heap memory by Kubernetes. Thought we can 
add the environment variable MALLOC_ARENA_MAX=1 for TaskManagers to avoid it, 
but that will reduce the throughput. In some scenarios, the throughput 
decreases 40%.


So we want to find out the answer of bellow questions, and further, any better 
solution. Anyone could help? Thanks in advance!


1、 Why savepoint or full checkpoints would lead to such glibc bug? Which 
methods calls trigger the bug?

2、 It doestn’t happen during savepoint or full checkpoints of all flink job, 
but some scenarios such as over aggregation windows which do hold large states. 
So anyone knows what certain scenarios or functions or operators will trigger 
such glibc bug?

3、 This troubleshooting has stated in the official documentation for several 
years, why nobody fixes the glibc bug or makes some effort from Flink side, 
such as using another memory allocator like jemelloc?


Here is the troubleshooting link and description, FYI.

https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_trouble/#container-memory-exceeded

Container Memory Exceeded 
#<https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_trouble/#container-memory-exceeded>

If a Flink container tries to allocate memory beyond its requested size (Yarn 
or Kubernetes), this usually indicates that Flink has not reserved enough 
native memory. You can observe this either by using an external monitoring 
system or from the error messages when a container gets killed by the 
deployment environment.

If you encounter this problem in the JobManager process, you can also enable 
the JVM Direct Memory limit by setting the 
jobmanager.memory.enable-jvm-direct-memory-limit<https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#jobmanager-memory-enable-jvm-direct-memory-limit>
 option to exclude possible JVM Direct Memory leak.

If 
RocksDBStateBackend<https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/state_backends/#the-rocksdbstatebackend>
 is used：

  *   and memory controlling is disabled: You can try to increase the 
TaskManager’s managed 
memory<https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_setup/#managed-memory>.
  *   and memory controlling is enabled and non-heap memory increases during 
savepoint or full checkpoints: This may happen due to the glibc memory 
allocator (see glibc 
bug<https://sourceware.org/bugzilla/show_bug.cgi?id=15321>). You can try to add 
the environment 
variable<https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#forwarding-environment-variables>
 MALLOC_ARENA_MAX=1 for TaskManagers.

Alternatively, you can increase the JVM 
Overhead<https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_setup/#capped-fractionated-components>.

Anyone encounters the non-heap memory increasing issue when using RocksDB as state backend and has better solutions?

Reply via email to