[ https://issues.apache.org/jira/browse/FLINK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289922#comment-17289922 ]
Piotr Nowojski edited comment on FLINK-21133 at 2/24/21, 1:33 PM: ------------------------------------------------------------------ [~kezhuw]: #2, if stop-with-savepoint fails, job can just continue. The problem is in case 3. and 4., where the only viable solution is I think to restart the job from last checkpoint/savepoint. I don't think this is happening right now? #3, please check FLIP-147 dev mailing list discussion. There is my proposal how to address this problem. {quote} If this were the case and if we could still create a checkpoint after a StreamOperator.close has been called, then we could simply send a EndOfPartitionEnvent when an operator reaches the end of input or receives the terminate call. Next, the StreamTask would only have to wait for a checkpoint to succeed before terminating. {quote} That would brake the current {{@Public}} API. For example our own {{FlinkKafkaProducer}} would stop working :( And as you mentioned [~trohrmann] it wouldn't solve the problem of shutting the operators one by one if they need to emit something in the {{notifyCheckpointComplete()}}. was (Author: pnowojski): [~kezhuw] #2, if stop-with-savepoint fails, job can just continue. The problem is in case 3. and 4., where the only viable solution is I think to restart the job from last checkpoint/savepoint. I don't think this is happening right now? #3, please check FLIP-147 dev mailing list discussion. There is my proposal how to address this problem. {quote} If this were the case and if we could still create a checkpoint after a StreamOperator.close has been called, then we could simply send a EndOfPartitionEnvent when an operator reaches the end of input or receives the terminate call. Next, the StreamTask would only have to wait for a checkpoint to succeed before terminating. {quote} That would brake the current {{@Public}} API. For example our own {{FlinkKafkaProducer}} would stop working :( And as you mentioned [~trohrmann] it wouldn't solve the problem of shutting the operators one by one if they need to emit something in the {{notifyCheckpointComplete()}}. > FLIP-27 Source does not work with synchronous savepoint > ------------------------------------------------------- > > Key: FLINK-21133 > URL: https://issues.apache.org/jira/browse/FLINK-21133 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream, Runtime / Checkpointing > Affects Versions: 1.11.3, 1.12.1 > Reporter: Kezhu Wang > Priority: Critical > Fix For: 1.11.4, 1.13.0, 1.12.3 > > > I have pushed branch > [synchronous-savepoint-conflict-with-bounded-end-input-case|https://github.com/kezhuw/flink/commits/synchronous-savepoint-conflict-with-bounded-end-input-case] > in my repository. {{SavepointITCase.testStopSavepointWithFlip27Source}} > failed due to timeout. > See also FLINK-21132 and > [apache/iceberg#2033|https://github.com/apache/iceberg/issues/2033].. -- This message was sent by Atlassian Jira (v8.3.4#803005)