1996fanrui commented on code in PR #21281:
URL: https://github.com/apache/flink/pull/21281#discussion_r1018967494


##########
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/CheckpointFailureManager.java:
##########
@@ -204,7 +204,8 @@ private void checkFailureAgainstCounter(
             if (continuousFailureCounter.get() > tolerableCpFailureNumber) {
                 clearCount();
                 errorHandler.accept(
-                        new 
FlinkRuntimeException(EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE));
+                        new FlinkRuntimeException(
+                                EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE, 
exception));

Review Comment:
   @Myasuka Thanks for your feedback. 
   
   You are right, the correct way is check full information from JM log or 
checkpoint UI. 
   
   Actually, I added this due to some reasons:
   
   - Some Flink platforms collect exceptions. When the job fails and JM stops, 
users can easily see the root cause of the last checkpoint through the 
exception. At this point WebUI has stopped, and it is more convenient than JM 
LOG.
   - Displaying more root cause has no effect on the original logic.
   - When developing some features, ITCase is often run without LOG enabled. 
Some ITCases fail, it just shows `Exceeded checkpoint tolerable failure 
threshold.`, doesn't show the root cause. Inconvenient to locate the problem. 😂
   
   I also don't think this change is necessary. You can take a look at these 
reasons and I will close this PR if not needed. Thanks~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to