Mingya Wang created FLINK-36081:
-----------------------------------

             Summary: Flink CDC MySQL source connector missing some columns 
data of newly added tables
                 Key: FLINK-36081
                 URL: https://issues.apache.org/jira/browse/FLINK-36081
             Project: Flink
          Issue Type: Bug
          Components: Flink CDC
    Affects Versions: cdc-3.1.1
         Environment: jdk 11

flink 1.17

flinkcdc 3.0.0
            Reporter: Mingya Wang
             Fix For: cdc-3.3.0


*Problem Description:*

When adding a new table, the Flink CDC MySQL source connector experiences 
missing data for some columns of the newly added table.

*Reproduction Scenario:*
 # Remove a table from a cdc job that is running normally, then start the job 
with resume functionality.
 # Perform a column addition operation on the removed table.
 # Add the table back to the job. The job continues to run without interruption 
upon table addition, but data for the newly added columns is missing in the 
synchronized data.

*Cause Analysis:*

The issue arises because the MySQL CDC Source maintains the table schema in 
state. When adding a new table, it recovers the schema from the previous state. 
Since the prior schema exists and represents the structure before the column 
addition, the MySQL CDC Source provides the downstream with data based on the 
schema cached in the state. Consequently, records outputted to downstream 
systems are missing the fields corresponding to the newly added columns.

*Proposed Solution:*

Upon removing a table from the cdc job, it is necessary to also correspondingly 
remove the table from the MySQLBinlogSplit.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to