Re: Chaining 2 flink jobs through a KAFKA topic with checkpoint enabled

赵一旦 Tue, 05 Jan 2021 05:10:43 -0800

I do not have this problem, so I guess it is related with the config of
your kafka producer and consumer, and maybe kafka topic properties or kafka
server properties also.


Arvid Heise <ar...@ververica.com> 于2021年1月5日周二 下午6:47写道：

> Hi Daniel,
>
> Flink commits transactions on checkpoints while Kafka Streams/connect
> usually commits on record. This is the typical tradeoff between Throughput
> and Latency. By decreasing the checkpoint interval in Flink, you can reach
> comparable latency to Kafka Streams.
>
> If you have two exactly once jobs, the second job may only read data that
> has been committed (not dirty as Chesnay said). If the second job were to
> consume data that is uncommitted, it will result in duplicates, in case the
> first job fails and rolls back.
>
> You can configure the read behavior with isolation.level. If you want to
> implement exactly once behavior, you also need to set that level in your
> other Kafka consumers [1]. Also compare what Kafka Streams is setting if
> you want to go exactly once [2].
>
> If you really want low latency, please also double-check if you really
> need exactly once.
>
> [1]
> https://kafka.apache.org/documentation/#consumerconfigs_isolation.level
> [2]
> https://kafka.apache.org/documentation/streams/developer-guide/config-streams.html#processing-guarantee
>
> On Mon, Dec 28, 2020 at 12:22 PM Chesnay Schepler <ches...@apache.org>
> wrote:
>
>> I don't particularly know the our Kafka connector, but it sounds like a
>> matter of whether a given consumer does dirty reads.
>> Flink does not, whereas the other tools you're using do.
>>
>> On 12/28/2020 7:57 AM, Daniel Peled wrote:
>>
>> Hello,
>>
>> We have 2 flink jobs that communicate with each other through a KAFKA
>> topic.
>> Both jobs use checkpoints with EXACTLY ONCE semantic.
>>
>> We have seen the following behaviour and we want to make sure and ask if
>> this is the expected behaviour or maybe it is a bug.
>>
>> When the first job produces a message to KAFKA, the message is consumed
>>  by the second job with a latency that depends on the *first* job *checkpoint
>> interval*.
>>
>> We are able to read the message using the kafka tool or using another
>> kafka consumer, but NOT with a flink kafka consumer that again depends on
>> the checkpoint interval of the first job.
>>
>> How come the consumer of the second job depends on the producer
>> transaction commit time of the first job ?
>>
>> BR,
>> Danny
>>
>>
>>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Re: Chaining 2 flink jobs through a KAFKA topic with checkpoint enabled

Reply via email to