Ihor Mielientiev created FLINK-37065: ----------------------------------------
Summary: MySQL cdc can lose/skip data during recovering from the checkpoint Key: FLINK-37065 URL: https://issues.apache.org/jira/browse/FLINK-37065 Project: Flink Issue Type: Bug Affects Versions: cdc-3.2.1, cdc-3.2.0 Reporter: Ihor Mielientiev During each pipeline start (e.g., failover or restart), the Flink CDC connector retrieves the current GTID sets from the MySQL server and merges it with the pipeline's current state. This merged GTID set is then sent back to the MySQL server to indicate which transactions the Flink pipeline has already processed, ensuring that the server doesn’t resend processed transactions. Flink CDC MySQL Connector uses the [fixRestoredGtidSet|https://github.com/apache/flink-cdc/blob/master/flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/io/debezium/connector/mysql/GtidUtils.java#L36] method to merge the GTID sets from the server with the GTID sets from the checkpoint. The method ensures that Flink will "tell" MySQL to skip over transactions it has already processed, avoiding duplication. However, the current implementation of this method doesn’t handle gaps caused by MySQL parallel execution. For example, if the restored GTID set is 1-80:83-90:92-98 and the server GTID set is 1-100, the method will merge gaps together and result will be 1-98, since it selects the highest gtid from checkpoint So in case if the pipeline has been restarted during any “gaps”, Flink CDC won’t process “gapped” transactions and will lose the data -- This message was sent by Atlassian Jira (v8.20.10#820010)