Hi Yun and Yu,

Thanks for driving this. This would definitely help users identify performance 
bottlenecks, especially for the cases where the bottleneck lies in the system 
stack (e.g. GC), and big +1 for the downloadable flamegraph to ease sharing. 
I'm wondering if we could add this for the job manager as well. In the OLAP 
scenario and sometimes in the streaming scenario (when there're some heavy 
operations during execution plan generation or in operator coordinators), the 
JM can have bottleneck as well.

Best,
Zhanghao Chen
________________________________
From: Yu Chen <yuchen.e...@gmail.com>
Sent: Monday, October 9, 2023 17:24
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: [DISCUSS] FLIP-375: Built-in cross-platform powerful java profiler on 
taskmanagers

Hi all,

Yun Tang and I are opening this thread to discuss our proposal to integrate
async-profiler's capabilities for profiling taskmananger (e.g., generating
flame graphs) in the Flink Web [1].


Currently, Flink provides ThreadDump and Operator-Level Flame Graphs by
sampling task threads. The results generated in such way missing the
relevant stack of java threads and system calls. The async-profiler[2] is a
low-overhead sampling profiler for Java, but the steps to use it in the
production environment are cumbersome and suffer from permissions and
security risks.

Therefore, we propose adding rest APIs to provide the capability to invoke
async-profiler on multiple platforms through JNI, which can be easily
operated on Web UI. This enhancement will improve the efficiency and
experience of Flink users in identifying performance bottlenecks.



Please refer to the FLIP document for more details about the proposed design
and implementation. We welcome any feedback and opinions on this proposal.



[1] FLIP-375: Built-in cross-platform powerful java profiler on
taskmanagers - Apache Flink - Apache Software Foundation
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-375%3A+Built-in+cross-platform+powerful+java+profiler+on+taskmanagers>

[2] GitHub - async-profiler/async-profiler: Sampling CPU and HEAP profiler
for Java featuring AsyncGetCallTrace + perf_events
<https://github.com/async-profiler/async-profiler>



Best regards,

Yun Tang and Yu Chen

Reply via email to