Hi Kirti, Simply speaking, sink needs to support `two-stage commit`, the sink can `write` data as normal and only `commit` data after the checkpoint is successful. This ensures that even if a failover occurs and data needs to be replayed, the previously written data is not visible to the user. However, this approach will increase data latency. The data is only visible after the checkpoint is completed and the data is committed, rather than immediately visible after the sink writes the data.
Best, Shammon FY On Thu, Aug 3, 2023 at 12:23 PM Kirti Dhar Upadhyay K via user < user@flink.apache.org> wrote: > Hi Team, > > > > I am using Flink File Source in one of my use case. > > I observed that, while reading file by source reader it stores its > position in checkpointed data. > > In case application crashes, it restores its position from checkpointed > data, once application comes up, which may result in re-emitting few > records which were emitted in between last checkpointing and application > crash. > > Whereas in doc link > https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/guarantees/ > I found that File source ensures exactly once delivery semantics with help > of data sink. > > *“**To guarantee end-to-end exactly-once record delivery (in addition to > exactly-once state semantics), the data sink needs to take part in the > checkpointing mechanism.”* > > > > > > Can someone put some light on this? > > > > Regards, > > Kirti Dhar > > >