Hi,
Thanks for the answer.
I will try the documents you have shared.
But still it would be great if you can take a look at the numbers below and
give some tips.


At the moment RSS is 46.6GB although taskmanager.memory.process.size is set
to 40000m

GC Statistics:
2023-09-06 15:15:03,785 INFO
 org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Memory
usage stats: [HEAP: 3703/18208/18208 MB, NON HEAP: 154/175/744 MB
(used/committed/max)]
2023-09-06 15:15:03,785 INFO
 org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Direct
memory stats: Count: 33620, Total Capacity: 1102003811, Used Memory:
1102003812
2023-09-06 15:15:03,785 INFO
 org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Off-heap
pool stats: [CodeHeap 'non-nmethods': 1/3/7 MB (used/committed/max)],
[Metaspace: 87/99/256 MB (used/committed/max)], [CodeHeap 'profiled
nmethods': 32/35/116 MB (used/committed/max)], [Compressed Class Space:
10/14/248 MB (used/committed/max)], [CodeHeap 'non-profiled nmethods':
21/22/116 MB (used/committed/max)]
2023-09-06 15:15:03,785 INFO
 org.apache.flink.runtime.taskexecutor.TaskManagerRunner      [] - Garbage
collector stats: [G1 Young Generation, GC TIME (ms): 30452, GC COUNT: 351],
[G1 Old Generation, GC TIME (ms): 0, GC COUNT: 0]

my configuration:

INFO  [] - Final TaskExecutor Memory configuration:
INFO  [] -   Total Process Memory:          39.063gb (41943040000 bytes)
INFO  [] -     Total Flink Memory:          37.813gb (40600862720 bytes)
INFO  [] -       Total JVM Heap Memory:     17.781gb (19092471808 bytes)
INFO  [] -         Framework:               128.000mb (134217728 bytes)
INFO  [] -         Task:                    17.656gb (18958254080 bytes)
INFO  [] -       Total Off-heap Memory:     20.031gb (21508390912 bytes)
INFO  [] -         Managed:                 18.906gb (20300431360 bytes)
INFO  [] -         Total JVM Direct Memory: 1.125gb (1207959552 bytes)
INFO  [] -           Framework:             128.000mb (134217728 bytes)
INFO  [] -           Task:                  0 bytes
INFO  [] -           Network:               1024.000mb (1073741824 bytes)
INFO  [] -     JVM Metaspace:               256.000mb (268435456 bytes)
INFO  [] -     JVM Overhead:                1024.000mb (1073741824 bytes)

jcmd output:

ubuntu@dzs-tef-test-01:~/flink/log$ jcmd 1035173  VM.native_memory summary
1035173:

Native Memory Tracking:

Total: reserved=23615554KB, committed=21049538KB
-                 Java Heap (reserved=18644992KB, committed=18644992KB)
                            (mmap: reserved=18644992KB,
committed=18644992KB)

-                     Class (reserved=347038KB, committed=106970KB)
                            (classes #15959)
                            (  instance classes #15140, array classes #819)
                            (malloc=5022KB #72815)
                            (mmap: reserved=342016KB, committed=101948KB)
                            (  Metadata:   )
                            (    reserved=88064KB, committed=86948KB)
                            (    used=79128KB)
                            (    free=7820KB)
                            (    waste=0KB =0.00%)
                            (  Class space:)
                            (    reserved=253952KB, committed=15000KB)
                            (    used=11278KB)
                            (    free=3722KB)
                            (    waste=0KB =0.00%)

-                    Thread (reserved=2404259KB, committed=262791KB)
                            (thread #2328)
                            (stack: reserved=2393052KB, committed=251584KB)
                            (malloc=8481KB #13970)
                            (arena=2726KB #4654)

-                      Code (reserved=252334KB, committed=67866KB)
                            (malloc=4650KB #21507)
                            (mmap: reserved=247684KB, committed=63216KB)

-                        GC (reserved=800181KB, committed=800181KB)
                            (malloc=74637KB #63221)
                            (mmap: reserved=725544KB, committed=725544KB)

-                  Compiler (reserved=20432KB, committed=20432KB)
                            (malloc=20300KB #8557)
                            (arena=133KB #5)

-                  Internal (reserved=21883KB, committed=21871KB)
                            (malloc=21839KB #29146)
                            (mmap: reserved=44KB, committed=32KB)

-                     Other (reserved=1082212KB, committed=1082212KB)
                            (malloc=1082212KB #34463)

-                    Symbol (reserved=17581KB, committed=17581KB)
                            (malloc=16678KB #187368)
                            (arena=903KB #1)

-    Native Memory Tracking (reserved=9173KB, committed=9173KB)
                            (malloc=1656KB #23012)
                            (tracking overhead=7517KB)

-        Shared class space (reserved=10904KB, committed=10904KB)
                            (mmap: reserved=10904KB, committed=10904KB)

-               Arena Chunk (reserved=288KB, committed=288KB)
                            (malloc=288KB)

-                   Logging (reserved=4KB, committed=4KB)
                            (malloc=4KB #193)

-                 Arguments (reserved=22KB, committed=22KB)
                            (malloc=22KB #534)

-                    Module (reserved=2726KB, committed=2726KB)
                            (malloc=2726KB #9625)

-              Synchronizer (reserved=1515KB, committed=1515KB)
                            (malloc=1515KB #12006)

-                 Safepoint (reserved=8KB, committed=8KB)
                            (mmap: reserved=8KB, committed=8KB)





On Wed, Sep 6, 2023 at 5:06 PM Biao Geng <biaoge...@gmail.com> wrote:

> Hi Kenan,
> If you have confirmed the heap memory is ok(e.g. no Java OOM exception and
> no frequent GC), then the cause may be off-heap memory over usage,
> especially when your flink job uses some native library.
> To diagnose such problem, you can refer to [1][2] for more details about
> using NMT and jeprof.
>
> [1]
> https://erikwramner.files.wordpress.com/2017/10/native-memory-leaks-in-java.pdf
> [2] https://www.evanjones.ca/java-native-leak-bug.html
> Best,
> Biao Geng
>
> Kenan Kılıçtepe <kkilict...@gmail.com> 于2023年9月6日周三 20:32写道:
>
>> Hi,
>>
>> I have Flink 1.16.2 on a single server with 64GB Ram.
>>
>> Although  taskmanager.memory.process.size  is set to 40000m, I can see
>> memory usage of the  task manager exceed 59GB and OS kills it because of
>> OOM.
>> I check the RSS column of application top for memory usage.
>>
>> I don`t see any heap memory problem.
>>
>> taskmanager.memory.process.size: 40000m
>> taskmanager.memory.managed.fraction: 0.53
>> state.backend.rocksdb.memory.managed: true
>>
>> Any help is appreciated for analyzing the problem.
>>
>> Thanks
>>
>>

Reply via email to