[ https://issues.apache.org/jira/browse/FLINK-23189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17418602#comment-17418602 ]
zlzhang0122 edited comment on FLINK-23189 at 9/22/21, 1:52 PM: --------------------------------------------------------------- [~pnowojski] ok, I've seen the fix and found that it added the handled of the onTriggerFailure when the checkpoint is null, I've noticed this situation but I didn't reproduced it in our production environment, so I didn't change the code here, but actually we may indeed need this fix for this case. was (Author: zlzhang0122): [~pnowojski] ok, I've seen the fix and found that it added the handled of the onTriggerFailure when the checkpoint is null, I've found this situation but I didn't reproduced it in our production environment, so I didn't change the code here, but actually we may indeed need that fix for some corner cases. > Count and fail the task when the disk is error on JobManager > ------------------------------------------------------------ > > Key: FLINK-23189 > URL: https://issues.apache.org/jira/browse/FLINK-23189 > Project: Flink > Issue Type: Improvement > Components: Runtime / Checkpointing > Affects Versions: 1.12.2, 1.13.1 > Reporter: zlzhang0122 > Assignee: zlzhang0122 > Priority: Major > Labels: pull-request-available > Fix For: 1.14.0 > > Attachments: exception.txt > > > When the jobmanager disk is error and the triggerCheckpoint will throw a > IOException and fail, this will cause a TRIGGER_CHECKPOINT_FAILURE, but this > failure won't cause Job failed. Users can hardly find this error if he don't > see the JobManager logs. To avoid this case, I propose that we can figure out > these IOException case and increase the failureCounter which can fail the job > finally. -- This message was sent by Atlassian Jira (v8.3.4#803005)