Mingya Wang created FLINK-36081:
-----------------------------------
Summary: Flink CDC MySQL source connector missing some columns
data of newly added tables
Key: FLINK-36081
URL: https://issues.apache.org/jira/browse/FLINK-36081
Project: Flink
Issue Type: Bug
Components: Flink CDC
Affects Versions: cdc-3.1.1
Environment: jdk 11
flink 1.17
flinkcdc 3.0.0
Reporter: Mingya Wang
Fix For: cdc-3.3.0
*Problem Description:*
When adding a new table, the Flink CDC MySQL source connector experiences
missing data for some columns of the newly added table.
*Reproduction Scenario:*
# Remove a table from a cdc job that is running normally, then start the job
with resume functionality.
# Perform a column addition operation on the removed table.
# Add the table back to the job. The job continues to run without interruption
upon table addition, but data for the newly added columns is missing in the
synchronized data.
*Cause Analysis:*
The issue arises because the MySQL CDC Source maintains the table schema in
state. When adding a new table, it recovers the schema from the previous state.
Since the prior schema exists and represents the structure before the column
addition, the MySQL CDC Source provides the downstream with data based on the
schema cached in the state. Consequently, records outputted to downstream
systems are missing the fields corresponding to the newly added columns.
*Proposed Solution:*
Upon removing a table from the cdc job, it is necessary to also correspondingly
remove the table from the MySQLBinlogSplit.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)