Ihor Mielientiev created FLINK-37065:
----------------------------------------

             Summary: MySQL cdc can lose/skip data during recovering from the 
checkpoint
                 Key: FLINK-37065
                 URL: https://issues.apache.org/jira/browse/FLINK-37065
             Project: Flink
          Issue Type: Bug
    Affects Versions: cdc-3.2.1, cdc-3.2.0
            Reporter: Ihor Mielientiev


During each pipeline start (e.g., failover or restart), the Flink CDC connector 
retrieves the current GTID sets from the MySQL server and merges it with the 
pipeline's current state. This merged GTID set is then sent back to the MySQL 
server to indicate which transactions the Flink pipeline has already processed, 
ensuring that the server doesn’t resend processed transactions.

 

Flink CDC MySQL Connector uses the 
[fixRestoredGtidSet|https://github.com/apache/flink-cdc/blob/master/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/io/debezium/connector/mysql/GtidUtils.java#L36]
 method to merge the GTID sets from the server with the GTID sets from the 
checkpoint. The method ensures that Flink will "tell" MySQL to skip over 
transactions it has already processed, avoiding duplication. However, the 
current implementation of this method doesn’t handle gaps caused by MySQL 
parallel execution. For example, if the restored GTID set is 1-80:83-90:92-98 
and the server GTID set is 1-100, the method will merge gaps together and 
result will be 1-98, since it selects the highest gtid from checkpoint

So in case if the pipeline has been restarted during any “gaps”, Flink CDC 
won’t process “gapped” transactions and will lose the data



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to