Hi folks, I did some initial investigation, and the problem seems twofold.
If no post-commit topology is used, we do not run into a problem where we could lose data but since we do not clean up the state correctly, we will hit this [1] when trying to stop the pipeline with a savepoint after we have started it from a savepoint. AFAICT all two-phase commit sinks are affected Kafka, File etc. For sinks using the post-commit topology, the same applies. Additionally, we might never do the commit from the post-commit topology resulting in lost data. Best, Fabian [1] https://github.com/apache/flink/blob/ed46cb2fd64f1cb306ae5b7654d2b4d64ab69f22/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/operators/sink/committables/CheckpointCommittableManagerImpl.java#L83