[ https://issues.apache.org/jira/browse/FLINK-8871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16845747#comment-16845747 ]
Yu Li commented on FLINK-8871: ------------------------------ bq. Please pay attention to your words, I think you are close to attacking me. Interesting. I'm just talking about the truth, and everyone is watching. bq. How do you think that I did not give advice? Is this not a suggestion? "Your solution sounds good, but it would wait for other things to be done" You call this a suggestion? Seriously? Did you mention anything about how to resolve the issue described here? Or give any proposal? If you insist this is a suggestion, fine, but it's an invalid one with no value. bq. I don't think it makes sense to compare the time the issue was created. No it makes a lot sense. It's a common sense that JIRA created later with the same target should be marked as duplicated and closed. bq. Regarding FLINK-12482, obviously you don't understand a lot of details... What is the value? And the introduction of the actor mode is related. The implementation is completely different. I'm not sure why you are so confident that I'm not aware of the details. FLINK-12477 mainly targets at changing the thread modeling of {{StreamTask}} and planned to use something like {{Disruptor/RingBuffer}} to implement lock-less event handling, while here in FLINK-8871 we mainly target at adding a {{notifyCheckpointAbort}} process in {{CheckpointCoordinator}} and do respond to this process in {{StreamTask}}, as well as executing checkpoint cancellation in {{StreamTask}}, why do you think these two things depend on each other? Why do you think the change on producer/consumer have to wait for the change of message queue implementation? bq. Can you let Yun Tang express his own opinion? I'm speaking of myself now and Yun is free to express his own opinion. Please stop your ridiculous assumption, thanks. bq. I mentioned CheckpointFailureManager because I expect a checkpoint exception to be processed It's ok if you suggest to implement checkpoint cancellation in {{CheckpointFailureManager}}, but not if you think you're the only qualified candidate to implement checkpoint related stuff since you're author of the {{CheckpointFailureManager}}, and use it as an excuse to grab this JIRA. > Checkpoint cancellation is not propagated to stop checkpointing threads on > the task manager > ------------------------------------------------------------------------------------------- > > Key: FLINK-8871 > URL: https://issues.apache.org/jira/browse/FLINK-8871 > Project: Flink > Issue Type: Bug > Components: Runtime / Checkpointing > Affects Versions: 1.3.2, 1.4.1, 1.5.0, 1.6.0, 1.7.0 > Reporter: Stefan Richter > Assignee: vinoyang > Priority: Critical > > Flink currently lacks any form of feedback mechanism from the job manager / > checkpoint coordinator to the tasks when it comes to failing a checkpoint. > This means that running snapshots on the tasks are also not stopped even if > their owning checkpoint is already cancelled. Two examples for cases where > this applies are checkpoint timeouts and local checkpoint failures on a task > together with a configuration that does not fail tasks on checkpoint failure. > Notice that those running snapshots do no longer account for the maximum > number of parallel checkpoints, because their owning checkpoint is considered > as cancelled. > Not stopping the task's snapshot thread can lead to a problematic situation > where the next checkpoints already started, while the abandoned checkpoint > thread from a previous checkpoint is still lingering around running. This > scenario can potentially cascade: many parallel checkpoints will slow down > checkpointing and make timeouts even more likely. > > A possible solution is introducing a {{cancelCheckpoint}} method as > counterpart to the {{triggerCheckpoint}} method in the task manager gateway, > which is invoked by the checkpoint coordinator as part of cancelling the > checkpoint. -- This message was sent by Atlassian JIRA (v7.6.3#76005)