Hi Min, Guowei is right, the comment in the documentation about exactly-once in embarrassingly parallel data flows refers to exactly-once *state consistency*, not *end-to-end* exactly-once. However, in strictly forwarding pipelines, enabling exactly-once checkpoints should not have drawbacks compared to at-least-once since there won't be any barrier alignment anyway. I'd simple enable exactly-once.
If you enable end-to-end exactly-once for a Kafka sink, the latency will be at least the checkpointing interval (maybe even more, depending on how you configure Flink's checkpointing mechanism and event-time / watermarks). Best, Fabian Am So., 7. Apr. 2019 um 08:41 Uhr schrieb Guowei Ma <guowei....@gmail.com>: > If your implementation only commits your changing after the complete of a > checkpoint I think the latency of e2e is at least the interval of > checkpoint. > > I think the document wants to say that a topology, which only has > flatmap/filter/map(no task has more than one input) could achieve the > exactly once semantics even in at least mode since the effect of barrier > alignments in at least mode is same as in exactly once mode by coincidence > for such topology. > > I think there might be some benefits if you could set the parallelism of > source/sink/flatmap to the same parallelism(there could exist other way) in > some situation since during the alignments the task, which has many inputs > would not deal with the elements behind the barrier in exactly mode until > the barriers of all inputs arrive. (If your checkpoint interval is very > very long I think there would be no difference). > > > Best > Guowei > > <min....@ubs.com>于2019年4月7日 周日上午3:14写道: > >> Hi, >> >> >> >> I have a simple data pipeline of a Kafka source, a flink map operator >> and a Kafka sink. >> >> >> >> I have a quick question about latency caused by the checkpoint on the >> exactly once mode. >> >> >> >> Due to the changes are committed and visible on a checkpoint completion, >> so the latency could be as long as that length of checkpoint interval e.g. >> 5seconds? >> >> >> >> Is my understanding correct? >> >> >> >> If I use the at least mode, there will be this addition on latency. More >> interestingly, the flink document >> https://ci.apache.org/projects/flink/flink-docs-release-1.7/internals/stream_checkpointing.html >> indicate that "dataflows with only embarrassingly parallel streaming >> operations (map(), flatMap(), filter(), …) actually give *exactly once* >> guarantees >> even in *at least once* mode." >> >> >> >> Unfortunately, I have been not able to achieve the exactly once with the >> at least once. Do I need more settings than I have with the exactly once >> mode? >> >> >> >> Many thanks for the advises in advance. >> >> >> >> Min >> >> >> >> >> >> >> >> >> > -- > Best, > Guowei >