We also faced the same issue with Flink 1.16.1. Please enable jemalloc as a
memory allocator, it fixed the issue for us.

On Wed, Sep 6, 2023 at 9:07 PM Kenan Kılıçtepe <kkilict...@gmail.com> wrote:

> Hi,
> Thanks for the answer.
> I will try the documents you have shared.
> But still it would be great if you can take a look at the numbers below
> and give some tips.
>
>
> At the moment RSS is 46.6GB although taskmanager.memory.process.size is
> set to 40000m
>
> GC Statistics:
> 2023-09-06 15:15:03,785 INFO
>  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Memory
> usage stats: [HEAP: 3703/18208/18208 MB, NON HEAP: 154/175/744 MB
> (used/committed/max)]
> 2023-09-06 15:15:03,785 INFO
>  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Direct
> memory stats: Count: 33620, Total Capacity: 1102003811, Used Memory:
> 1102003812
> 2023-09-06 15:15:03,785 INFO
>  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Off-heap
> pool stats: [CodeHeap 'non-nmethods': 1/3/7 MB (used/committed/max)],
> [Metaspace: 87/99/256 MB (used/committed/max)], [CodeHeap 'profiled
> nmethods': 32/35/116 MB (used/committed/max)], [Compressed Class Space:
> 10/14/248 MB (used/committed/max)], [CodeHeap 'non-profiled nmethods':
> 21/22/116 MB (used/committed/max)]
> 2023-09-06 15:15:03,785 INFO
>  org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Garbage
> collector stats: [G1 Young Generation, GC TIME (ms): 30452, GC COUNT: 351],
> [G1 Old Generation, GC TIME (ms): 0, GC COUNT: 0]
>
> my configuration:
>
> INFO  [] - Final TaskExecutor Memory configuration:
> INFO  [] -   Total Process Memory:          39.063gb (41943040000 bytes)
> INFO  [] -     Total Flink Memory:          37.813gb (40600862720 bytes)
> INFO  [] -       Total JVM Heap Memory:     17.781gb (19092471808 bytes)
> INFO  [] -         Framework:               128.000mb (134217728 bytes)
> INFO  [] -         Task:                    17.656gb (18958254080 bytes)
> INFO  [] -       Total Off-heap Memory:     20.031gb (21508390912 bytes)
> INFO  [] -         Managed:                 18.906gb (20300431360 bytes)
> INFO  [] -         Total JVM Direct Memory: 1.125gb (1207959552 bytes)
> INFO  [] -           Framework:             128.000mb (134217728 bytes)
> INFO  [] -           Task:                  0 bytes
> INFO  [] -           Network:               1024.000mb (1073741824 bytes)
> INFO  [] -     JVM Metaspace:               256.000mb (268435456 bytes)
> INFO  [] -     JVM Overhead:                1024.000mb (1073741824 bytes)
>
> jcmd output:
>
> ubuntu@dzs-tef-test-01:~/flink/log$ jcmd 1035173  VM.native_memory summary
> 1035173:
>
> Native Memory Tracking:
>
> Total: reserved=23615554KB, committed=21049538KB
> -                 Java Heap (reserved=18644992KB, committed=18644992KB)
>                             (mmap: reserved=18644992KB,
> committed=18644992KB)
>
> -                     Class (reserved=347038KB, committed=106970KB)
>                             (classes #15959)
>                             (  instance classes #15140, array classes #819)
>                             (malloc=5022KB #72815)
>                             (mmap: reserved=342016KB, committed=101948KB)
>                             (  Metadata:   )
>                             (    reserved=88064KB, committed=86948KB)
>                             (    used=79128KB)
>                             (    free=7820KB)
>                             (    waste=0KB =0.00%)
>                             (  Class space:)
>                             (    reserved=253952KB, committed=15000KB)
>                             (    used=11278KB)
>                             (    free=3722KB)
>                             (    waste=0KB =0.00%)
>
> -                    Thread (reserved=2404259KB, committed=262791KB)
>                             (thread #2328)
>                             (stack: reserved=2393052KB, committed=251584KB)
>                             (malloc=8481KB #13970)
>                             (arena=2726KB #4654)
>
> -                      Code (reserved=252334KB, committed=67866KB)
>                             (malloc=4650KB #21507)
>                             (mmap: reserved=247684KB, committed=63216KB)
>
> -                        GC (reserved=800181KB, committed=800181KB)
>                             (malloc=74637KB #63221)
>                             (mmap: reserved=725544KB, committed=725544KB)
>
> -                  Compiler (reserved=20432KB, committed=20432KB)
>                             (malloc=20300KB #8557)
>                             (arena=133KB #5)
>
> -                  Internal (reserved=21883KB, committed=21871KB)
>                             (malloc=21839KB #29146)
>                             (mmap: reserved=44KB, committed=32KB)
>
> -                     Other (reserved=1082212KB, committed=1082212KB)
>                             (malloc=1082212KB #34463)
>
> -                    Symbol (reserved=17581KB, committed=17581KB)
>                             (malloc=16678KB #187368)
>                             (arena=903KB #1)
>
> -    Native Memory Tracking (reserved=9173KB, committed=9173KB)
>                             (malloc=1656KB #23012)
>                             (tracking overhead=7517KB)
>
> -        Shared class space (reserved=10904KB, committed=10904KB)
>                             (mmap: reserved=10904KB, committed=10904KB)
>
> -               Arena Chunk (reserved=288KB, committed=288KB)
>                             (malloc=288KB)
>
> -                   Logging (reserved=4KB, committed=4KB)
>                             (malloc=4KB #193)
>
> -                 Arguments (reserved=22KB, committed=22KB)
>                             (malloc=22KB #534)
>
> -                    Module (reserved=2726KB, committed=2726KB)
>                             (malloc=2726KB #9625)
>
> -              Synchronizer (reserved=1515KB, committed=1515KB)
>                             (malloc=1515KB #12006)
>
> -                 Safepoint (reserved=8KB, committed=8KB)
>                             (mmap: reserved=8KB, committed=8KB)
>
>
>
>
>
> On Wed, Sep 6, 2023 at 5:06 PM Biao Geng <biaoge...@gmail.com> wrote:
>
>> Hi Kenan,
>> If you have confirmed the heap memory is ok(e.g. no Java OOM exception
>> and no frequent GC), then the cause may be off-heap memory over usage,
>> especially when your flink job uses some native library.
>> To diagnose such problem, you can refer to [1][2] for more details about
>> using NMT and jeprof.
>>
>> [1]
>> https://erikwramner.files.wordpress.com/2017/10/native-memory-leaks-in-java.pdf
>> <https://urldefense.com/v3/__https://erikwramner.files.wordpress.com/2017/10/native-memory-leaks-in-java.pdf__;!!BeGeivfSdT4o5A!nj2GI8qB53x1Nb2Xv0reWxXYM2PE21xDqVzP9com6hD9omjrlV_rqMcrfWrT9gDUNMM20CZldcFD-WP_b0EH$>
>> [2] https://www.evanjones.ca/java-native-leak-bug.html
>> <https://urldefense.com/v3/__https://www.evanjones.ca/java-native-leak-bug.html__;!!BeGeivfSdT4o5A!nj2GI8qB53x1Nb2Xv0reWxXYM2PE21xDqVzP9com6hD9omjrlV_rqMcrfWrT9gDUNMM20CZldcFD-a0o4f96$>
>> Best,
>> Biao Geng
>>
>> Kenan Kılıçtepe <kkilict...@gmail.com> 于2023年9月6日周三 20:32写道:
>>
>>> Hi,
>>>
>>> I have Flink 1.16.2 on a single server with 64GB Ram.
>>>
>>> Although  taskmanager.memory.process.size  is set to 40000m, I can see
>>> memory usage of the  task manager exceed 59GB and OS kills it because of
>>> OOM.
>>> I check the RSS column of application top for memory usage.
>>>
>>> I don`t see any heap memory problem.
>>>
>>> taskmanager.memory.process.size: 40000m
>>> taskmanager.memory.managed.fraction: 0.53
>>> state.backend.rocksdb.memory.managed: true
>>>
>>> Any help is appreciated for analyzing the problem.
>>>
>>> Thanks
>>>
>>>

-- 

IMPORTANT NOTICE:  The contents of this email and any attachments are 
confidential in nature and intended solely for the addressee, and are 
subject to the terms and conditions of disclosure as further described 
here: https://www.scd.swiggy.in/nda <https://www.scd.swiggy.in/nda>. If you 
are not the intended recipient or you do not agree to the terms and 
conditions of disclosure, please delete this email immediately, and notify 
the sender by return email. In the event that you continue to access the 
information herein or act upon it in any manner, the terms and conditions 
shall be deemed accepted by you.

Reply via email to