[jira] [Commented] (FLINK-5285) CancelCheckpointMarker flood when using at least once mode

ASF GitHub Bot (JIRA) Thu, 08 Dec 2016 08:21:24 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15732633#comment-15732633
 ]


ASF GitHub Bot commented on FLINK-5285:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2964#discussion_r91547796
  
    --- Diff: 
flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/io/BarrierTracker.java
 ---
    @@ -225,17 +230,19 @@ private void 
processCheckpointAbortBarrier(CancelCheckpointMarker barrier, int c
                                pendingCheckpoints.removeFirst();
                        }
                }
    -           else {
    +           else if (checkpointId > latestPendingCheckpointID) {
                        notifyAbort(checkpointId);
     
    -                   // first barrier for this checkpoint - remember it as 
aborted
    -                   // since we polled away all entries with lower 
checkpoint IDs
    -                   // this entry will become the new first entry
    -                   if (pendingCheckpoints.size() < 
MAX_CHECKPOINTS_TO_TRACK) {
    -                           CheckpointBarrierCount abortedMarker = new 
CheckpointBarrierCount(checkpointId);
    -                           abortedMarker.markAborted();
    -                           pendingCheckpoints.addFirst(abortedMarker);
    -                   }
    +                   latestPendingCheckpointID = checkpointId;
    +
    +                   CheckpointBarrierCount abortedMarker = new 
CheckpointBarrierCount(checkpointId);
    +                   abortedMarker.markAborted();
    +                   pendingCheckpoints.addLast(abortedMarker);
    --- End diff --
    
    Small comment here: I would
      - either keep the `addFirst()` statement here (we can be sure that is 
true, given that we pulled out all older checkpoints)
      - or add a sanity check that `pendingCheckpoints` is empty at that point.
    
    That way we explicitly guard the assumption that `pendingCheckpoints` 
contains entries on ordered sequence (which is currently only implicitly 
guarded by the `checkpointId > latestPendingCheckpointID` condition.


> CancelCheckpointMarker flood when using at least once mode
> ----------------------------------------------------------
>
>                 Key: FLINK-5285
>                 URL: https://issues.apache.org/jira/browse/FLINK-5285
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.2.0, 1.1.3
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>             Fix For: 1.2.0, 1.1.4
>
>
> When using at least once mode ({{BarrierTracker}}), then an interleaved 
> arrival of cancellation barriers at the {{BarrierTracker}} of two consecutive 
> checkpoints can trigger a flood of {{CancelCheckpointMarkers}}. 
> The following sequence is problematic:
> {code}
> Cancel(1, 0),
> Cancel(2, 0),
> Cancel(1, 1),
> Cancel(2, 1),
> Cancel(1, 2),
> Cancel(2, 2)
> {code}
> with {{Cancel(checkpointId, channelId)}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-5285) CancelCheckpointMarker flood when using at least once mode

Reply via email to