[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

Piotr Nowojski (JIRA) Tue, 06 Aug 2019 01:40:13 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-13497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900776#comment-16900776
 ]


Piotr Nowojski commented on FLINK-13497:
----------------------------------------

[~SleePy] we are also not aware of any performance problems in this area. My 
gut feeling tells me that we would need hundred of thousands of operations per 
second on JobManager main executor thread before we overload it and as far as I 
know, we haven't observed this to be an issue. If it becomes an issue, we can 
analyse it and then make an informative decision what to do: optimise code 
(those code was never written with performance in mind, so there is definitely 
lots of room for improvement) or spread the work load on more threads.

> Checkpoints can complete after CheckpointFailureManager fails job
> -----------------------------------------------------------------
>
>                 Key: FLINK-13497
>                 URL: https://issues.apache.org/jira/browse/FLINK-13497
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Checkpointing
>    Affects Versions: 1.9.0, 1.10.0
>            Reporter: Till Rohrmann
>            Priority: Critical
>             Fix For: 1.10.0
>
>
> I think that we introduced with FLINK-12364 an inconsistency wrt to job 
> termination a checkpointing. In FLINK-9900 it was discovered that checkpoints 
> can complete even after the {{CheckpointFailureManager}} decided to fail a 
> job. I think the expected behaviour should be that we fail all pending 
> checkpoints once the {{CheckpointFailureManager}} decides to fail the job.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

[jira] [Commented] (FLINK-13497) Checkpoints can complete after CheckpointFailureManager fails job

Reply via email to