Hi Min,

Guowei is right, the comment in the documentation about exactly-once in
embarrassingly parallel data flows refers to exactly-once *state
consistency*, not *end-to-end* exactly-once.
However, in strictly forwarding pipelines, enabling exactly-once
checkpoints should not have drawbacks compared to at-least-once since there
won't be any barrier alignment anyway. I'd simple enable exactly-once.

If you enable end-to-end exactly-once for a Kafka sink, the latency will be
at least the checkpointing interval (maybe even more, depending on how you
configure Flink's checkpointing mechanism and event-time / watermarks).

Best,
Fabian

Am So., 7. Apr. 2019 um 08:41 Uhr schrieb Guowei Ma <guowei....@gmail.com>:

> If your implementation only commits your changing after the complete of a
> checkpoint I think the latency of e2e is at least the interval of
> checkpoint.
>
> I think the document wants to say that a topology, which only has
> flatmap/filter/map(no  task has more than one input) could achieve the
> exactly once semantics even in at least mode since the effect of barrier
> alignments in at least mode is same as in exactly once mode by coincidence
> for such topology.
>
> I think there might be some benefits if you could set the parallelism of
> source/sink/flatmap to the same parallelism(there could exist other way) in
> some situation since during the alignments the task, which has many inputs
> would not deal with the elements behind the barrier in exactly mode until
> the barriers of all inputs  arrive.  (If your checkpoint interval is very
> very long I think there would be no difference).
>
>
> Best
> Guowei
>
> <min....@ubs.com>于2019年4月7日 周日上午3:14写道:
>
>> Hi,
>>
>>
>>
>> I have a simple data pipeline of a Kafka source, a flink map operator
>> and  a Kafka sink.
>>
>>
>>
>> I have a quick question about latency caused by the checkpoint on the
>> exactly once mode.
>>
>>
>>
>> Due to the changes are committed and visible on a checkpoint completion,
>> so the latency could be as long as that length of checkpoint interval e.g.
>> 5seconds?
>>
>>
>>
>> Is my understanding correct?
>>
>>
>>
>> If I use the at least mode, there will be this addition on latency.  More
>> interestingly, the flink document
>> https://ci.apache.org/projects/flink/flink-docs-release-1.7/internals/stream_checkpointing.html
>> indicate that "dataflows with only embarrassingly parallel streaming
>> operations (map(), flatMap(), filter(), …) actually give *exactly once* 
>> guarantees
>> even in *at least once* mode."
>>
>>
>>
>> Unfortunately, I have been not able to achieve the exactly once with the
>> at least once. Do I need more settings than I have with the exactly once
>> mode?
>>
>>
>>
>> Many thanks for the advises in advance.
>>
>>
>>
>> Min
>>
>>
>>
>>
>>
>>
>>
>>
>>
> --
> Best,
> Guowei
>

Reply via email to