Stephan Ewen created FLINK-18429:
------------------------------------

             Summary: Add default method for 
CheckpointListener.notifyCheckpointAborted(checkpointId)
                 Key: FLINK-18429
                 URL: https://issues.apache.org/jira/browse/FLINK-18429
             Project: Flink
          Issue Type: Bug
          Components: API / DataStream
            Reporter: Stephan Ewen
            Assignee: Stephan Ewen
             Fix For: 1.11.0


The {{CheckpointListener}} interface is implemented by many users. Adding a new 
method {{notifyCheckpointAborted(long)}} to the interface without a default 
method breaks many user programs.

We should turn this method into a default method:
  - Avoid breaking programs
  - It is in practice less relevant for programs to react to checkpoints being 
aborted then to being completed. The reason is that on completion you often 
want to commit side-effects, while on abortion you frequently do not do 
anything, but let the next successful checkpoint commit all changes up to then.

*Original Confusion*

There was confusion about this originally, going back to a comment by myself 
suggesting this should not be a default method, incorrectly thinking of it as 
an internal interface: 
https://github.com/apache/flink/pull/8693#issuecomment-542834147

See also clarification email on the mailing list:
{noformat}
About the "notifyCheckpointAborted()":

When I wrote that comment, I was (apparently wrongly) assuming we were talking 
about an internal interface here, because the "abort" signal was originally 
only intended to cancel the async part of state backend checkpoints.

I just realized that this is exposed to users - and I am actually with Thomas 
on this one. The "CheckpointListener" is a very public interface that many 
users implement. The fact that it is tagged "@PublicEvolving" is somehow not 
aligned with reality. So adding the method here will in reality break lots and 
lots of user programs.

I think also in practice it is much less relevant for user applications to 
react to aborted checkpoints. Since the notifications there can not be relied 
upon (if there is a task failure concurrently) users always have to follow the 
"newer checkpoint subsumes older checkpoint" contract, so the abort method is 
probably rarely relevant.

This is something we should change, in my opinion.
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to