Your expectations are correct. In case of AT_LEAST_ONCE  Flink will wait
for all outstanding records in the Kafka buffers to be acknowledged before
marking the checkpoint successful (=also recording the offsets of the
sources). That said, there might be other factors involved that could lead
to a different output even when reading the same data from the sources -
such as using using processing time (instead of event time) or doing some
sort of lookup calls to external systems. If you absolutely cannot think of
a scenario where this could be the case for your application, please try to
reproduce the error reliably - this is something that needs to be
further looked into.

Best,
Alexander Fedulov

On Mon, 23 Oct 2023 at 19:11, Gabriele Modena <gmod...@wikimedia.org> wrote:

> Hey folks,
>
> We currently run (py) flink 1.17 on k8s (managed by flink k8s
> operator), with HA and checkpointing (fixed retries policy). We
> produce into Kafka with AT_LEAST_ONCE delivery guarantee.
>
> Our application failed when trying to produce a message larger than
> Kafka's message larger than message.max.bytes. This offset was never
> going to be committed, so Flink HA was not able to recover the
> application.
>
> Upon a manual restart, it looks like the offending offset has been
> lost: it was not picked after rewinding to the checkpointed offset,
> and it was not committed to Kafka. I would have expected this offset
> to not have made it past the KafkaProducer commit checkpoint barrier,
> and that the app would fail again on it.
>
> I understand that there are failure scenarios that could end in data
> loss when Kafka delivery guarantee is set to EXACTLY_ONCE and kafka
> expires an uncommitted transaction.
>
> However, it's unclear to me if other corner cases would apply to
> AT_LEAST_ONCE guarantees. Upon broker failure and app restarts, I
> would expect duplicate messages but no data loss. What I can see as a
> problem is that this commit was never going to happen.
>
> Is this expected behaviour? Am I missing something here?
>
> Cheers,
> --
> Gabriele Modena (he / him)
> Staff Software Engineer
> Wikimedia Foundation
>

Reply via email to