Rui Fan created FLINK-30184:
-------------------------------

             Summary: Save TM/JM thread stack periodically
                 Key: FLINK-30184
                 URL: https://issues.apache.org/jira/browse/FLINK-30184
             Project: Flink
          Issue Type: Improvement
          Components: Runtime / Web Frontend
            Reporter: Rui Fan
             Fix For: 1.17.0


After FLINK-14816 FLINK-25398 and FLINK-25372 , flink user can view the thread 
stack of TM/JM in Flink WebUI. 

It can help flink users to find out why the Flink job is stuck, or why the 
processing is slow. It is very useful for trouble shooting.

However, sometimes Flink tasks get stuck or process slowly, but when the user 
troubleshoots the problem, the job has resumed. It is difficult to find out 
what happened to the Flink job at the time and why is it slow?

 

So, could we periodically save the thread stack of TM or JM in the TM log 
directory?

Define some configurations:
cluster.thread-dump.interval=1min
cluster.thread-dump.cleanup-time=48 hours



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to