[ 
https://issues.apache.org/jira/browse/FLINK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289922#comment-17289922
 ] 

Piotr Nowojski edited comment on FLINK-21133 at 2/24/21, 1:33 PM:
------------------------------------------------------------------

[~kezhuw]:
#2, if stop-with-savepoint fails, job can just continue. The problem is in case 
3. and 4., where the only viable solution is I think to restart the job from 
last checkpoint/savepoint. I don't think this is happening right now?

#3, please check FLIP-147 dev mailing list discussion. There is my proposal how 
to address this problem.

{quote}
If this were the case and if we could still create a checkpoint after a 
StreamOperator.close has been called, then we could simply send a 
EndOfPartitionEnvent when an operator reaches the end of input or receives the 
terminate call. Next, the StreamTask would only have to wait for a checkpoint 
to succeed before terminating.
{quote}
That would brake the current {{@Public}} API. For example our own 
{{FlinkKafkaProducer}} would stop working :( And as you mentioned [~trohrmann] 
it wouldn't solve the problem of shutting the operators one by one if they need 
to emit something in the {{notifyCheckpointComplete()}}.


was (Author: pnowojski):
[~kezhuw]
#2, if stop-with-savepoint fails, job can just continue. The problem is in case 
3. and 4., where the only viable solution is I think to restart the job from 
last checkpoint/savepoint. I don't think this is happening right now?

#3, please check FLIP-147 dev mailing list discussion. There is my proposal how 
to address this problem.

{quote}
If this were the case and if we could still create a checkpoint after a 
StreamOperator.close has been called, then we could simply send a 
EndOfPartitionEnvent when an operator reaches the end of input or receives the 
terminate call. Next, the StreamTask would only have to wait for a checkpoint 
to succeed before terminating.
{quote}
That would brake the current {{@Public}} API. For example our own 
{{FlinkKafkaProducer}} would stop working :( And as you mentioned [~trohrmann] 
it wouldn't solve the problem of shutting the operators one by one if they need 
to emit something in the {{notifyCheckpointComplete()}}.

> FLIP-27 Source does not work with synchronous savepoint
> -------------------------------------------------------
>
>                 Key: FLINK-21133
>                 URL: https://issues.apache.org/jira/browse/FLINK-21133
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Core, API / DataStream, Runtime / Checkpointing
>    Affects Versions: 1.11.3, 1.12.1
>            Reporter: Kezhu Wang
>            Priority: Critical
>             Fix For: 1.11.4, 1.13.0, 1.12.3
>
>
> I have pushed branch 
> [synchronous-savepoint-conflict-with-bounded-end-input-case|https://github.com/kezhuw/flink/commits/synchronous-savepoint-conflict-with-bounded-end-input-case]
>  in my repository. {{SavepointITCase.testStopSavepointWithFlip27Source}} 
> failed due to timeout.
> See also FLINK-21132 and 
> [apache/iceberg#2033|https://github.com/apache/iceberg/issues/2033]..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to