Dear Yaroslav Thanks for the explanation :-)
So if I just want partial snapshot base on certain statement is it possible to do you Flink CDC? The reason is the table is huge(100+ GBs) so full snapshot would overwhelm the Flink application. Or are there any other practices that I can follow here ? Br, Ken Hung On Wed, Oct 9, 2024 at 10:34 PM Yaroslav Tkachenko <yaros...@goldsky.com> wrote: > Hi Ken, > > Snapshotting is implemented differently in Flink CDC, it doesn't re-use > Debezium's implementation. So you can override some Debezium properties > using "debezium.", but not "debezium.snapshot.". > > On Wed, Oct 9, 2024 at 12:46 PM Ken CHUAN YU <ken.hung...@gmail.com> > wrote: > >> Hi there >> I have issue to use flink sql connector to capture change data from >> MariaDB(MySQL) when configure “debezium.* settings here are more details: >> I have following table in the source database (MariaDB): >> ‘’’CREATE TABLE `client_test` ( >> `id` int(11) unsigned NOT NULL AUTO_INCREMENT, >> `name` varchar(500) NOT NULL DEFAULT '', >> `age` int(11) NOT NULL, >> PRIMARY KEY (`id`) >> ); >> ‘’' >> >> Becasue some reason I need only partial data in this table for the >> snapshot so I define the Flink stream table as follow: >> >> ‘’’CREATE TABLE client_cdc ( >> id DOUBLE, >> name VARCHAR(500), >> age DOUBLE, >> PRIMARY KEY(id) NOT ENFORCED >> ) >> WITH ( >> 'connector' = 'mysql-cdc', >> 'hostname' = 'mariadb', >> 'port' = '3306', >> 'username' = 'ooo', >> 'password' = 'ooo', >> 'database-name' = 'xxx', >> 'scan.startup.mode' = 'initial', >> 'table-name' = 'client_test', >> 'debezium.snapshot.query.mode' = 'custom', >> 'debezium.snapshot.select.statement.overrides' = 'xxx.client_test', >> 'debezium.snapshot.select.statement.overrides.xxx.client_test' = 'SELECT >> * FROM xxx.client_test WHERE id > 3' >> ); >> ‘’’ >> Above, I tried do filter out the rows which id is less than 3 when >> sanpshot. But after execute select * from client_cdc; in flink client I can >> still see all the sanpshot. >> I also try to run this: >> >> ‘’’CREATE TABLE client_cdc ( >> id DOUBLE, >> name VARCHAR(500), >> age DOUBLE, >> PRIMARY KEY(id) NOT ENFORCED >> ) >> WITH ( >> 'connector' = 'mysql-cdc', >> 'hostname' = 'mariadb', >> 'port' = '3306', >> 'username' = 'ooo', >> 'password' = 'ooo', >> 'database-name' = 'xxx', >> 'scan.startup.mode' = 'initial', >> 'table-name' = 'client_test', >> 'debezium.snapshot.query.mode' = 'custom', >> 'debezium.snapshot.select.statement.overrides' = 'xxx.client_test', >> 'debezium.snapshot.select.statement.overrides.xxx.client_test' = 'this >> should failed SELECT * FROM xxx.client_test WHERE id > 3' >> ); >> ‘’' >> This time I give an invaild query to >> 'debezium.snapshot.select.statement.overrides.xxx.client_test' but I can >> still execute select * from client_cdc; and it still take a full snapshot. >> In other word it seems to me the Flink CDC connector is ignoring the >> settings are Prefix debezium.* Am I missing anything here? >> According to the document I be able to config the debezium but doesn’t >> seems the case. >> >> The expectation is to see only rows selected during snapshot in >> configuration: >> "debezium.snapshot.select.statement.overrides.[database].[table] “ ex: >> SELECT * FROM xxx.client_test WHERE id > 3 I should only see id is greater >> than 3 after the snapshot in the stream table even I have id less than 3 in >> the table in mysql database >> >> Am I missing anything here? >> >> The Flink version I’m using :1.18 >> Flink cdc connector I’m using : flink-sql-connector-mysql-cdc-3.1.1 >> JDBC version: mysql-connector-j-9.0.0 >> Here is the setting about debezium.snapshot.select.statement.overrides : >> https://debezium.io/documentation/reference/1.9/connectors/mysql.html#mysql-property-snapshot-select-statement-overrides >> >> Thank you for your help in advanced >> Br, >> Ken Hung >> >> >> >> >> >> >> >>