[ https://issues.apache.org/jira/browse/FLINK-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845852#comment-16845852 ]
vinoyang commented on FLINK-8871: --------------------------------- [~carp84] What is my behavior similar to yours? What vulgar vocabulary did I use in communicating with you? You can point it out, I could humbly change it. Can you point it out? I don't care whether you use quotes or not, I never use this vocabulary to others in the community. This is a kind of slander, no doubt! I let him wait, he can also choose to set the assignee to him, whether this is due to respect is just from your guess. I have said I didn't stop him working for this issue. I just didn't explain my action, I admitted this question. Does this need to rise to any moral height level? Please take a look at your personal comments from beginning to end. I try to focus on explaining and commenting on the problem itself, and you always comment on me in a roundabout way. Is this the way you participate in the community? If the suggestion of waiting is my own conclusion, then you can speculate on me at random, I have no objection, but I have stated that this is a suggestion from Stephan. Otherwise, before Yun's comment, maybe I have already submitted PR (you should know it was November 2018). Why this issue's PR does not need to wait? Do you think Stephan's suggestion is unfounded? And please take a look at [Yun tang's own comment|https://issues.apache.org/jira/browse/FLINK-8871?focusedCommentId=16789755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16789755], these two APIs are similar. If one needs to be refactored because of FLINK-12477, based on the existing implementation, the other one does not need to? I really don't know how to explain it. > Checkpoint cancellation is not propagated to stop checkpointing threads on > the task manager > ------------------------------------------------------------------------------------------- > > Key: FLINK-8871 > URL: https://issues.apache.org/jira/browse/FLINK-8871 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.3.2, 1.4.1, 1.5.0, 1.6.0, 1.7.0 > Reporter: Stefan Richter > Priority: Critical > > Flink currently lacks any form of feedback mechanism from the job manager / > checkpoint coordinator to the tasks when it comes to failing a checkpoint. > This means that running snapshots on the tasks are also not stopped even if > their owning checkpoint is already cancelled. Two examples for cases where > this applies are checkpoint timeouts and local checkpoint failures on a task > together with a configuration that does not fail tasks on checkpoint failure. > Notice that those running snapshots do no longer account for the maximum > number of parallel checkpoints, because their owning checkpoint is considered > as cancelled. > Not stopping the task's snapshot thread can lead to a problematic situation > where the next checkpoints already started, while the abandoned checkpoint > thread from a previous checkpoint is still lingering around running. This > scenario can potentially cascade: many parallel checkpoints will slow down > checkpointing and make timeouts even more likely. > > A possible solution is introducing a {{cancelCheckpoint}} method as > counterpart to the {{triggerCheckpoint}} method in the task manager gateway, > which is invoked by the checkpoint coordinator as part of cancelling the > checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)