Hi Sue Alen,

AFAIK, Flink encountered memory leak issues with RocksDB block cache. The
root cause is the memory fragmentation with glibc. You may get some
information in FLINK-18712[1].
Actually there are some efforts from the Flink side, such as using jemalloc
as default in image[2].
I have no idea why savepoint or full checkpoints would lead to such bugs,
could you provide more details? Such as the version of Flink you are using,
the memory metrics and checkpoint/savepoint metrics. And which format of
savepoint (native or canonical) you are taking?

[1] https://issues.apache.org/jira/browse/FLINK-18712
[2] https://github.com/apache/flink-docker/pull/43


Best,
Zakelly

On Fri, Dec 29, 2023 at 3:35 PM Sue Alen <alen....@hotmail.com> wrote:

> Does anyone encounter the non-heap memory increasing issue when using
> RocksDB as state backend with memory controlling enabled? It has plagued us
> for long time. Our flink jobs are deployed under Kubernetes, and they would
> be killed because of OOM by the increasing non-heap memory by Kubernetes.
> Thought we can add the environment variable MALLOC_ARENA_MAX=1 for
> TaskManagers to avoid it, but that will reduce the throughput. In some
> scenarios, the throughput decreases 40%.
>
> So we want to find out the answer of bellow questions, and further, any
> better solution. Anyone could help? Thanks in advance!
>
>
> 1、 Why savepoint or full checkpoints would lead to such glibc bug? Which
> methods calls trigger the bug?
>
> 2、 It doestn’t happen during savepoint or full checkpoints of all flink
> job, but some scenarios such as over aggregation windows which do hold
> large states. So anyone knows what certain scenarios or functions or
> operators will trigger such glibc bug?
>
> 3、 This troubleshooting has stated in the official documentation for
> several years, why nobody fixes the glibc bug or makes some effort from
> Flink side, such as using another memory allocator like jemelloc?
>
>
> Here is the troubleshooting link and description, FYI.
>
>
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_trouble/#container-memory-exceeded
>
> Container Memory Exceeded #<
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_trouble/#container-memory-exceeded
> >
>
> If a Flink container tries to allocate memory beyond its requested size
> (Yarn or Kubernetes), this usually indicates that Flink has not reserved
> enough native memory. You can observe this either by using an external
> monitoring system or from the error messages when a container gets killed
> by the deployment environment.
>
> If you encounter this problem in the JobManager process, you can also
> enable the JVM Direct Memory limit by setting the
> jobmanager.memory.enable-jvm-direct-memory-limit<
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#jobmanager-memory-enable-jvm-direct-memory-limit>
> option to exclude possible JVM Direct Memory leak.
>
> If RocksDBStateBackend<
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/ops/state/state_backends/#the-rocksdbstatebackend>
> is used:
>
>   *   and memory controlling is disabled: You can try to increase the
> TaskManager’s managed memory<
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_setup/#managed-memory
> >.
>   *   and memory controlling is enabled and non-heap memory increases
> during savepoint or full checkpoints: This may happen due to the glibc
> memory allocator (see glibc bug<
> https://sourceware.org/bugzilla/show_bug.cgi?id=15321>). You can try to
> add the environment variable<
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#forwarding-environment-variables>
> MALLOC_ARENA_MAX=1 for TaskManagers.
>
> Alternatively, you can increase the JVM Overhead<
> https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/memory/mem_setup/#capped-fractionated-components
> >.
>
>
>
>

Reply via email to