If your implementation only commits your changing after the complete of a
checkpoint I think the latency of e2e is at least the interval of
checkpoint.

I think the document wants to say that a topology, which only has
flatmap/filter/map(no  task has more than one input) could achieve the
exactly once semantics even in at least mode since the effect of barrier
alignments in at least mode is same as in exactly once mode by coincidence
for such topology.

I think there might be some benefits if you could set the parallelism of
source/sink/flatmap to the same parallelism(there could exist other way) in
some situation since during the alignments the task, which has many inputs
would not deal with the elements behind the barrier in exactly mode until
the barriers of all inputs  arrive.  (If your checkpoint interval is very
very long I think there would be no difference).


Best
Guowei

<min....@ubs.com>于2019年4月7日 周日上午3:14写道:

> Hi,
>
>
>
> I have a simple data pipeline of a Kafka source, a flink map operator and
> a Kafka sink.
>
>
>
> I have a quick question about latency caused by the checkpoint on the
> exactly once mode.
>
>
>
> Due to the changes are committed and visible on a checkpoint completion,
> so the latency could be as long as that length of checkpoint interval e.g.
> 5seconds?
>
>
>
> Is my understanding correct?
>
>
>
> If I use the at least mode, there will be this addition on latency.  More
> interestingly, the flink document
> https://ci.apache.org/projects/flink/flink-docs-release-1.7/internals/stream_checkpointing.html
> indicate that "dataflows with only embarrassingly parallel streaming
> operations (map(), flatMap(), filter(), …) actually give *exactly once* 
> guarantees
> even in *at least once* mode."
>
>
>
> Unfortunately, I have been not able to achieve the exactly once with the
> at least once. Do I need more settings than I have with the exactly once
> mode?
>
>
>
> Many thanks for the advises in advance.
>
>
>
> Min
>
>
>
>
>
>
>
>
>
-- 
Best,
Guowei

Reply via email to