[ 
https://issues.apache.org/jira/browse/FLINK-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15667321#comment-15667321
 ] 

ASF GitHub Bot commented on FLINK-5063:
---------------------------------------

GitHub user tillrohrmann opened a pull request:

    https://github.com/apache/flink/pull/2812

    [FLINK-5063] Discard state handles of declined or expired state handles

    Whenever the checkpoint coordinator receives an acknowledge checkpoint 
message which belongs
    to the job maintained by the checkpoint coordinator, it should either 
record the state handles
    for later processing or discard to free the resources. The latter case can 
happen if a
    checkpoint has been expired and late acknowledge checkpoint messages 
arrive. Furthermore, it
    can happen if a Task sent a decline checkpoint message while other Tasks 
where still drawing
    a checkpoint. This PR changes the behaviour such that state handles 
belonging to the job of
    the checkpoint coordinator are discarded if they could not be added to the 
PendingCheckpoint.
    
    Review @uce, @StephanEwen 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tillrohrmann/flink fixStateHandleCleanup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/2812.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2812
    
----
commit c4c000d1b39de5617b6796eed524ce2a449100d3
Author: Till Rohrmann <trohrm...@apache.org>
Date:   2016-11-14T17:33:55Z

    [FLINK-5063] Discard state handles of declined or expired state handles
    
    Whenever the checkpoint coordinator receives an acknowledge checkpoint 
message which belongs
    to the job maintained by the checkpoint coordinator, it should either 
record the state handles
    for later processing or discard to free the resources. The latter case can 
happen if a
    checkpoint has been expired and late acknowledge checkpoint messages 
arrive. Furthremore, it
    can happen if a Task sent a decline checkpoint message while other Tasks 
where still drawing
    a checkpoint. This PR changes the behaviour such that state handles 
belonging to the job of
    the checkpoint coordinator are discarded if they could not be added to the 
PendingCheckpoint.

----


> State handles are not properly cleaned up for declined or expired checkpoints
> -----------------------------------------------------------------------------
>
>                 Key: FLINK-5063
>                 URL: https://issues.apache.org/jira/browse/FLINK-5063
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.2.0, 1.1.3
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Critical
>             Fix For: 1.2.0, 1.1.4
>
>
> In case that a {{Checkpoint}} is declined or expires, the 
> {{CheckpointCoordinator}} will dispose the {{PendingCheckpoint}}. Disposing 
> the {{PendingCheckpoint}} entails that all so far registered 
> {{SubtaskStates}} of the acknowledged {{Tasks}} are discarded. However, all 
> late arriving acknowledge messages are simply ignored without properly 
> discarding the transmitted state handles. This can lead to a cluttering of 
> checkpoint directory since the checkpoint files of late or unknown 
> acknowledge checkpoint messages are never deleted.
> I propose to properly discard the state handles at the 
> {{CheckpointCoordinator}} if receiving a late acknowledge message or an 
> acknowledge message for an unknown {{ExecutionAttemptID}} belonging to the 
> job of the {{CheckpointCoordinator}}. However, checkpoint messages belonging 
> to a different job won't be handled and simply ignored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to