[ https://issues.apache.org/jira/browse/FLINK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17288467#comment-17288467 ]
Till Rohrmann edited comment on FLINK-21133 at 2/25/21, 8:45 AM: ----------------------------------------------------------------- +1 for those use cases/semantics summarised by [~trohrmann]. I agree that 3. and 4. are also effectively the same. Maybe trying to conclude various loose threads that we had here. I see the following, mostly independent, issues: a) Two phase commit support for 3. and 4. This will be dealt by FLIP-147 (please check the discussion on the dev mailing list) b) Unfortunately in FLINK-21132 we broke 3. (*stop-with-savepoint --drain*). In this case, `endOfInput()` should be called (CC [~roman_khachatryan]). Otherwise, some operators are not flushing/draining the buffered state (like for example {{AsyncWaitOperator}}, which is doing it only in the {{endOfInput()}} call). Note that before FLINK-21132, 3. was working correctly only if we ignore the issue of committing side effects (two phase commit support). FLINK-21453 will fix this problem. c) Changing 2., from "stop with savepoint" to "cancel with savepoint". Previously I thought about it as a refactor/clean up AND optimisation (speed up of the shutdown). However, as we can not used this approach for 3., I think it's just an optimisation that would diverge the code base. For this reason I think it would be better to postpone such optimisation after FLIP-147 is done (if ever). d) FLIP-27 not supporting stop with savepoint (both 3. and 4.) was (Author: pnowojski): +1 for those use cases/semantics summarised by [~trohrmann]. I agree that 3. and 4. are also effectively the same. Maybe trying to conclude various loose threads that we had here. I see the following, mostly independent, issues: a) Two phase commit support for 3. and 4. This will be dealt by FLIP-147 (please check the discussion on the dev mailing list) b) Unfortunately in FLINK-21132 we broke 3. (*stop-with-savepoint --drain*). In this case, `endOfInput()` should be called (CC [~roman_khachatryan]). Otherwise, some operators are not flushing/draining the buffered state (like for example {{AsyncWaitOperator}}, which is doing it only in the {{endOfInput()}} call). Note that before FLINK-21132, 3. was working correctly only if we ignore the issue of committing side effects (two phase commit support). c) Changing 2., from "stop with savepoint" to "cancel with savepoint". Previously I thought about it as a refactor/clean up AND optimisation (speed up of the shutdown). However, as we can not used this approach for 3., I think it's just an optimisation that would diverge the code base. For this reason I think it would be better to postpone such optimisation after FLIP-147 is done (if ever). d) FLIP-27 not supporting stop with savepoint (both 3. and 4.) > FLIP-27 Source does not work with synchronous savepoint > ------------------------------------------------------- > > Key: FLINK-21133 > URL: https://issues.apache.org/jira/browse/FLINK-21133 > Project: Flink > Issue Type: Bug > Components: API / Core, API / DataStream, Runtime / Checkpointing > Affects Versions: 1.11.3, 1.12.1 > Reporter: Kezhu Wang > Priority: Critical > Fix For: 1.11.4, 1.13.0, 1.12.3 > > > I have pushed branch > [synchronous-savepoint-conflict-with-bounded-end-input-case|https://github.com/kezhuw/flink/commits/synchronous-savepoint-conflict-with-bounded-end-input-case] > in my repository. {{SavepointITCase.testStopSavepointWithFlip27Source}} > failed due to timeout. > See also FLINK-21132 and > [apache/iceberg#2033|https://github.com/apache/iceberg/issues/2033].. -- This message was sent by Atlassian Jira (v8.3.4#803005)