If your implementation only commits your changing after the complete of a checkpoint I think the latency of e2e is at least the interval of checkpoint.
I think the document wants to say that a topology, which only has flatmap/filter/map(no task has more than one input) could achieve the exactly once semantics even in at least mode since the effect of barrier alignments in at least mode is same as in exactly once mode by coincidence for such topology. I think there might be some benefits if you could set the parallelism of source/sink/flatmap to the same parallelism(there could exist other way) in some situation since during the alignments the task, which has many inputs would not deal with the elements behind the barrier in exactly mode until the barriers of all inputs arrive. (If your checkpoint interval is very very long I think there would be no difference). Best Guowei <min....@ubs.com>于2019年4月7日 周日上午3:14写道: > Hi, > > > > I have a simple data pipeline of a Kafka source, a flink map operator and > a Kafka sink. > > > > I have a quick question about latency caused by the checkpoint on the > exactly once mode. > > > > Due to the changes are committed and visible on a checkpoint completion, > so the latency could be as long as that length of checkpoint interval e.g. > 5seconds? > > > > Is my understanding correct? > > > > If I use the at least mode, there will be this addition on latency. More > interestingly, the flink document > https://ci.apache.org/projects/flink/flink-docs-release-1.7/internals/stream_checkpointing.html > indicate that "dataflows with only embarrassingly parallel streaming > operations (map(), flatMap(), filter(), …) actually give *exactly once* > guarantees > even in *at least once* mode." > > > > Unfortunately, I have been not able to achieve the exactly once with the > at least once. Do I need more settings than I have with the exactly once > mode? > > > > Many thanks for the advises in advance. > > > > Min > > > > > > > > > -- Best, Guowei