[ 
https://issues.apache.org/jira/browse/FLINK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288467#comment-17288467
 ] 

Piotr Nowojski edited comment on FLINK-21133 at 2/23/21, 8:54 AM:
------------------------------------------------------------------

+1 for those use cases/semantics summarised by [~trohrmann]. I agree that 3. 
and 4. are also effectively the same. Maybe trying to conclude various loose 
threads that we had here. I see the following, mostly independent, issues:

a) Two phase commit support for 3. and 4. This will be dealt by FLIP-147 
(please check the discussion on the dev mailing list)
b) Unfortunately in FLINK-21132 we broke 3. (*stop-with-savepoint --drain*). In 
this case, `endOfInput()` should be called (CC [~roman_khachatryan]). 
Otherwise, some operators are not flushing/draining the buffered state (like 
for example {{AsyncWaitOperator}}, which is doing it only in the 
{{endOfInput()}} call). Note that before FLINK-21132, 3. was working correctly 
only if we ignore the issue of committing side effects (two phase commit 
support).
c) Changing 2., from "stop with savepoint" to "cancel with savepoint". 
Previously I thought about it as a refactor/clean up AND optimisation (speed up 
of the shutdown). However, as we can not used this approach for 3., I think 
it's just an optimisation that would diverge the code base. For this reason I 
think it would be better to postpone such optimisation after FLIP-147 is done 
(if ever).
d) FLIP-27 not supporting stop with savepoint (both 3. and 4.)


was (Author: pnowojski):
+1 for those use cases/semantics summarised by [~trohrmann]. I agree that 3. 
and 4. are also effectively the same. Maybe trying to conclude various loose 
threads that we had here. I see the following, mostly independent, issues:

a) Two phase commit support for 3. and 4. This will be dealt by FLIP-147 
(please check the discussion on the dev mailing list)
b) Unfortunately in FLINK-21132 we broke 3. (*stop-with-savepoint --drain*). In 
this case, `endOfInput()` should be called (CC [~roman_khachatryan]). 
Otherwise, some operators are not flushing/draining the buffered state (like 
for example {{AsyncWaitOperator}}, which is doing it only in the 
{{endOfInput()}} call). Note that before FLINK-21332, 3. was working correctly 
only if we ignore the issue of committing side effects (two phase commit 
support).
c) Changing 2., from "stop with savepoint" to "cancel with savepoint". 
Previously I thought about it as a refactor/clean up AND optimisation (speed up 
of the shutdown). However, as we can not used this approach for 3., I think 
it's just an optimisation that would diverge the code base. For this reason I 
think it would be better to postpone such optimisation after FLIP-147 is done 
(if ever).
d) FLIP-27 not supporting stop with savepoint (both 3. and 4.)

> FLIP-27 Source does not work with synchronous savepoint
> -------------------------------------------------------
>
>                 Key: FLINK-21133
>                 URL: https://issues.apache.org/jira/browse/FLINK-21133
>             Project: Flink
>          Issue Type: Bug
>          Components: API / Core, API / DataStream, Runtime / Checkpointing
>    Affects Versions: 1.11.3, 1.12.1
>            Reporter: Kezhu Wang
>            Priority: Critical
>             Fix For: 1.11.4, 1.13.0, 1.12.3
>
>
> I have pushed branch 
> [synchronous-savepoint-conflict-with-bounded-end-input-case|https://github.com/kezhuw/flink/commits/synchronous-savepoint-conflict-with-bounded-end-input-case]
>  in my repository. {{SavepointITCase.testStopSavepointWithFlip27Source}} 
> failed due to timeout.
> See also FLINK-21132 and 
> [apache/iceberg#2033|https://github.com/apache/iceberg/issues/2033]..



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to