1996fanrui commented on code in PR #21281: URL: https://github.com/apache/flink/pull/21281#discussion_r1018967494
########## flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointFailureManager.java: ########## @@ -204,7 +204,8 @@ private void checkFailureAgainstCounter( if (continuousFailureCounter.get() > tolerableCpFailureNumber) { clearCount(); errorHandler.accept( - new FlinkRuntimeException(EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE)); + new FlinkRuntimeException( + EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE, exception)); Review Comment: @Myasuka Thanks for your feedback. You are right, the correct way is check full information from JM log or checkpoint UI. Actually, I added this due to some reasons: - Some Flink platforms collect exceptions. When the job fails and JM stops, users can easily see the root cause of the last checkpoint through the exception. At this point WebUI has stopped, and it is more convenient than JM LOG. - Displaying more root cause has no effect on the original logic. - When developing some features, ITCase is often run without LOG enabled. Some ITCases fail, it just shows `Exceeded checkpoint tolerable failure threshold.`, doesn't show the root cause. Inconvenient to locate the problem. 😂 I also don't think this change is necessary. You can take a look at these reasons and I will close this PR if not needed. Thanks~ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org