Re: CDC using Query

Leonard Xu Mon, 07 Feb 2022 01:34:08 -0800

Hello, mohan

> 1. Does flink have any support to track any missed source Jdbc CDC records ?


Flink CDC Connector provides Exactly once semantics which means they won’t miss 
records. Tips: The Flink JDBC Connector only 
Scan the database once which can not continuously read CDC stream.

> 2. What is the equivalent of Kafka consumer groups ?

Different database has different CDC mechanism, it’s serverId which used to 
mark a slave for MySQL/MariaDB, it’s slot name for PostgresSQL. 


> 3. Delivering to kafka from flink is not exactly once. Is that right ?

No, both Flink CDC Connector and Flink Kafka Connector provide exactly once 
implementation.

BTW, if your destination is Elasticsearch, the quick start demo[1] may help you.

Best,
Leonard

[1] 
https://ververica.github.io/flink-cdc-connectors/master/content/quickstart/mysql-postgres-tutorial.html


> 
> Thanks
> 
> On Friday, February 4, 2022, mohan radhakrishnan 
> <radhakrishnan.mo...@gmail.com <mailto:radhakrishnan.mo...@gmail.com>> wrote:
> Hello,
>                So the jdbc source connector is  kafka and transformation is 
> done by flink (flink sql) ? But that connector can miss records. I thought. 
> Started looking at flink for this and other use cases.
> Can I see the alternative to spring cloudstreams( kafka streams )? Since I am 
> learning flink, kafka streams' changelog topics and exactly-once delivery and 
> dlqs seemed good for our cŕitical push notifications.
> 
> We also needed a  elastic  sink.
> 
> Thanks
> 
> On Friday, February 4, 2022, Dawid Wysakowicz <dwysakow...@apache.org 
> <mailto:dwysakow...@apache.org>> wrote:
> Hi Mohan,
> 
> I don't know much about Kafka Connect, so I will not talk about its features 
> and differences to Flink. Flink on its own does not have a capability to read 
> a CDC stream directly from a DB. However there is the flink-cdc-connectors[1] 
> projects which embeds the standalone Debezium engine inside of Flink's source 
> and can process DB changelog with all processing guarantees that Flink 
> provides.
> 
> As for the idea of processing further with Kafka Streams. Why not process 
> data with Flink? What do you miss in Flink?
> 
> Best,
> 
> Dawid
> 
> [1] https://github.com/ververica/flink-cdc-connectors 
> <https://github.com/ververica/flink-cdc-connectors>
> 
> On 04/02/2022 13:55, mohan radhakrishnan wrote:
> Hi,
>      When I was looking for CDC I realized Flink uses Kafka Connector to 
> stream to Flink. The idea is to send it forward to Kafka and consume it using 
> Kafka Streams.
> 
> Are there source DLQs or additional mechanisms to detect failures to read 
> from the DB ?
> 
> We don't want to use Debezium and our CDC is based on queries.
> 
> What mechanisms does Flink have that a Kafka Connect worker does not ? Kafka 
> Connect workers can go down and source data can be lost.
> 
> Does the idea  to send it forward to Kafka and consume it using Kafka Streams 
> make sense ? The checkpointing feature of Flink can help ? I plan to use 
> Kafka Streams for 'Exactly-once Delivery' and changelog topics.
> 
> Could you point out relevant material to read ?
> 
> Thanks,
> Mohan

Re: CDC using Query

Reply via email to