[ 
https://issues.apache.org/jira/browse/FLINK-39924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Khachatryan updated FLINK-39924:
--------------------------------------
    Priority: Major  (was: Critical)

> Memory fragmentation from jemalloc misconfiguration
> ---------------------------------------------------
>
>                 Key: FLINK-39924
>                 URL: https://issues.apache.org/jira/browse/FLINK-39924
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Configuration
>    Affects Versions: 2.0.2, 2.2.1, 1.20.5, 2.1.3
>            Reporter: Keith Lee
>            Assignee: Keith Lee
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.4.0
>
>
> We observed excessive memory fragmentation in production, using malloc_stats 
> we identified the most extreme case of fragmentation at 3.91 GB (10.01 GB 
> Resident - 6.1 GB Active) which was significant as the pod has a limit of 16 
> GB. 
> We also observed that jemalloc arena count was higher than expected default 
> of 4 x number_of_cpu_cores. 
> h2. Why is high jemalloc arena count bad?
> Large number of arenas leads to infrequently used arenas, infrequently used 
> arenas hold dirty pages for dirty_decay_ms before releasing memory to OS. 
> This leaves less memory for Flink process and OS page cache, impacting 
> performance and cause higher likelihood to OOMKill.
> h2. Root cause
> Jemalloc by default configures narena using the 4 * number_of_cpu_core, 
> however *jemalloc is not container aware and the value for number_of_cpu_core 
> is obtained from the host machine* instead of pod CPU resource configuration. 
> See jemalloc default: 
> [https://github.com/jemalloc/jemalloc/blob/4de3a4c3d1bb4520acdc856ddab3e57a28eb7795/src/jemalloc_init.c#L379-L391]
> h2. Reproduction and confirmation
> Steps to reproduce can be found here: 
> [https://github.com/leekeiabstraction/flink-docker/tree/reproduce-jemalloc-fragmentation/reproduce-jemalloc-fragmentation]
> The reproduction was ran on a 14 core Mac book pro. We find on a reduction of 
> 10.7 % in resident set size and a slight performance improvement when narena 
> is configured to 4 * pod CPU count.
> {{============================================================}}
> {{[+] Per-image summary:}}
> {{============================================================}}
> {{  image                                          highest anon      avg anon 
>   lowest write-recs    avg write-recs}}
> {{  flink:2.2.1-scala_2.12-java17                    1679.3 MiB    1522.6 MiB 
>              186901            207614}}
> {{  flink-2.2.1-narenas4                             1499.7 MiB    1301.9 MiB 
>              200945            213198}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to