Your expectations are correct. In case of AT_LEAST_ONCE Flink will wait for all outstanding records in the Kafka buffers to be acknowledged before marking the checkpoint successful (=also recording the offsets of the sources). That said, there might be other factors involved that could lead to a different output even when reading the same data from the sources - such as using using processing time (instead of event time) or doing some sort of lookup calls to external systems. If you absolutely cannot think of a scenario where this could be the case for your application, please try to reproduce the error reliably - this is something that needs to be further looked into.
Best, Alexander Fedulov On Mon, 23 Oct 2023 at 19:11, Gabriele Modena <gmod...@wikimedia.org> wrote: > Hey folks, > > We currently run (py) flink 1.17 on k8s (managed by flink k8s > operator), with HA and checkpointing (fixed retries policy). We > produce into Kafka with AT_LEAST_ONCE delivery guarantee. > > Our application failed when trying to produce a message larger than > Kafka's message larger than message.max.bytes. This offset was never > going to be committed, so Flink HA was not able to recover the > application. > > Upon a manual restart, it looks like the offending offset has been > lost: it was not picked after rewinding to the checkpointed offset, > and it was not committed to Kafka. I would have expected this offset > to not have made it past the KafkaProducer commit checkpoint barrier, > and that the app would fail again on it. > > I understand that there are failure scenarios that could end in data > loss when Kafka delivery guarantee is set to EXACTLY_ONCE and kafka > expires an uncommitted transaction. > > However, it's unclear to me if other corner cases would apply to > AT_LEAST_ONCE guarantees. Upon broker failure and app restarts, I > would expect duplicate messages but no data loss. What I can see as a > problem is that this commit was never going to happen. > > Is this expected behaviour? Am I missing something here? > > Cheers, > -- > Gabriele Modena (he / him) > Staff Software Engineer > Wikimedia Foundation >