Re: File Source Exactly Once Delivery Semantics

Shammon FY Wed, 02 Aug 2023 22:43:31 -0700

Hi Kirti,

Simply speaking, sink needs to support `two-stage commit`, the sink can
`write` data as normal and only `commit` data after the checkpoint is
successful. This ensures that even if a failover occurs and data needs to
be replayed, the previously written data is not visible to the
user. However, this approach will increase data latency. The data is only
visible after the checkpoint is completed and the data is committed, rather
than immediately visible after the sink writes the data.


Best,
Shammon FY

On Thu, Aug 3, 2023 at 12:23 PM Kirti Dhar Upadhyay K via user <
user@flink.apache.org> wrote:

> Hi Team,
>
>
>
> I am using Flink File Source in one of my use case.
>
> I observed that, while reading file by source reader it stores its
> position in checkpointed data.
>
> In case application crashes, it restores its position from checkpointed
> data, once application comes up, which may result in re-emitting few
> records which were emitted in between last checkpointing and application
> crash.
>
> Whereas in doc link
> https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/guarantees/
> I found that File source ensures exactly once delivery semantics with help
> of data sink.
>
> *“**To guarantee end-to-end exactly-once record delivery (in addition to
> exactly-once state semantics), the data sink needs to take part in the
> checkpointing mechanism.”*
>
>
>
>
>
> Can someone put some light on this?
>
>
>
> Regards,
>
> Kirti Dhar
>
>
>

Re: File Source Exactly Once Delivery Semantics

Reply via email to