curcur commented on a change in pull request #18964: URL: https://github.com/apache/flink/pull/18964#discussion_r820493182
########## File path: flink-state-backends/flink-statebackend-changelog/src/main/java/org/apache/flink/state/changelog/PeriodicMaterializationManager.java ########## @@ -166,6 +167,10 @@ private void asyncMaterializationPhase( subtaskName, upTo); + scheduleNextMaterialization(); + } else if (throwable instanceof CancellationException) { + // likely due to task cancellation or abortion notification + LOG.info("materialization cancelled", throwable); scheduleNextMaterialization(); } else { Review comment: > > If a task gets a CancellationException, shouldn't it fail the whole job? > > No, task cancellation doesn't mean job failure. This is true. > > Why checkpoint abortion notification can pass CancellationException to the part of materialization? Materialization should be independent of Checkpointing. > > Currently, the issue happens only because of task cancellation; abortion notification can not reach the nested backend and therefore materializer. But with [FLINK-25850](https://issues.apache.org/jira/browse/FLINK-25850) it will be possible. So I added this comment and decided not to react to `CancellationException` (e.g. by stopping the materializer). The checkpoint should be independent of Materialization? Checkpoint abortion should not affect materialization. That's the main purpose of separating materialization out from checkpointing procedure. I can not agree on this part. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org