zlzhang0122 created FLINK-23189: ----------------------------------- Summary: Count and fail the task when the disk is error on JobManager Key: FLINK-23189 URL: https://issues.apache.org/jira/browse/FLINK-23189 Project: Flink Issue Type: Improvement Affects Versions: 1.13.1, 1.12.2 Reporter: zlzhang0122
When the jobmanager disk is error and the triggerCheckpoint will throw a IOException and fail, this will cause a TRIGGER_CHECKPOINT_FAILURE, but this failure won't cause Job failed. Users can hardly find this error if he don't see the JobManager logs. To avoid this case, I propose that we can figure out these IOException case and increase the failureCounter which can fail the job finally. -- This message was sent by Atlassian Jira (v8.3.4#803005)