[jira] [Commented] (FLINK-9598) [Checkpoints] The config Minimum Pause Between Checkpoints doesn't work when there's a checkpoint failure

ASF GitHub Bot (JIRA) Tue, 07 Aug 2018 06:50:28 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-9598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16571678#comment-16571678
 ]


ASF GitHub Bot commented on FLINK-9598:
---------------------------------------

zentol commented on issue #6346: [FLINK-9598] Refine java-doc about the min 
pause between checkpoints
URL: https://github.com/apache/flink/pull/6346#issuecomment-411062550
 
 
   After looking at the discussion threasd I'm not sure if it makes sense to 
merge this PR. If the behavior is deemed buggy we shouldn't touch the javadocs 
and fix the behavior instead. One could argue that they should still outline 
the _current_ state, but then end up switching back-and-forth between versions.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [Checkpoints] The config Minimum Pause Between Checkpoints doesn't work when 
> there's a checkpoint failure
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-9598
>                 URL: https://issues.apache.org/jira/browse/FLINK-9598
>             Project: Flink
>          Issue Type: Bug
>    Affects Versions: 1.3.2
>            Reporter: Prem Santosh
>            Assignee: Yun Tang
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Screen Shot 2018-06-20 at 7.44.10 AM.png
>
>
> We have set the config Minimum Pause Between Checkpoints to be 10 min but 
> noticed that when a checkpoint fails (because it timesout before it 
> completes) the application immediately starts taking the next checkpoint. 
> This basically stalls the application's progress since its always taking 
> checkpoints.
> [^Screen Shot 2018-06-20 at 7.44.10 AM.png] is a screenshot of this issue.
> Details:
>  * Running Flink-1.3.2 on EMR
>  * checkpoint timeout duration: 40 min
>  * minimum pause between checkpoints: 10 min
> There is also a [relevant 
> thread|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Having-a-backoff-while-experiencing-checkpointing-failures-td20618.html]
>  that I found on the Flink users group.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (FLINK-9598) [Checkpoints] The config Minimum Pause Between Checkpoints doesn't work when there's a checkpoint failure

Reply via email to