I do not have this problem, so I guess it is related with the config of your kafka producer and consumer, and maybe kafka topic properties or kafka server properties also.
Arvid Heise <ar...@ververica.com> 于2021年1月5日周二 下午6:47写道: > Hi Daniel, > > Flink commits transactions on checkpoints while Kafka Streams/connect > usually commits on record. This is the typical tradeoff between Throughput > and Latency. By decreasing the checkpoint interval in Flink, you can reach > comparable latency to Kafka Streams. > > If you have two exactly once jobs, the second job may only read data that > has been committed (not dirty as Chesnay said). If the second job were to > consume data that is uncommitted, it will result in duplicates, in case the > first job fails and rolls back. > > You can configure the read behavior with isolation.level. If you want to > implement exactly once behavior, you also need to set that level in your > other Kafka consumers [1]. Also compare what Kafka Streams is setting if > you want to go exactly once [2]. > > If you really want low latency, please also double-check if you really > need exactly once. > > [1] > https://kafka.apache.org/documentation/#consumerconfigs_isolation.level > [2] > https://kafka.apache.org/documentation/streams/developer-guide/config-streams.html#processing-guarantee > > On Mon, Dec 28, 2020 at 12:22 PM Chesnay Schepler <ches...@apache.org> > wrote: > >> I don't particularly know the our Kafka connector, but it sounds like a >> matter of whether a given consumer does dirty reads. >> Flink does not, whereas the other tools you're using do. >> >> On 12/28/2020 7:57 AM, Daniel Peled wrote: >> >> Hello, >> >> We have 2 flink jobs that communicate with each other through a KAFKA >> topic. >> Both jobs use checkpoints with EXACTLY ONCE semantic. >> >> We have seen the following behaviour and we want to make sure and ask if >> this is the expected behaviour or maybe it is a bug. >> >> When the first job produces a message to KAFKA, the message is consumed >> by the second job with a latency that depends on the *first* job *checkpoint >> interval*. >> >> We are able to read the message using the kafka tool or using another >> kafka consumer, but NOT with a flink kafka consumer that again depends on >> the checkpoint interval of the first job. >> >> How come the consumer of the second job depends on the producer >> transaction commit time of the first job ? >> >> BR, >> Danny >> >> >> > > -- > > Arvid Heise | Senior Java Developer > > <https://www.ververica.com/> > > Follow us @VervericaData > > -- > > Join Flink Forward <https://flink-forward.org/> - The Apache Flink > Conference > > Stream Processing | Event Driven | Real Time > > -- > > Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany > > -- > Ververica GmbH > Registered at Amtsgericht Charlottenburg: HRB 158244 B > Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji > (Toni) Cheng >