Re: [PR] [FLINK-37065]: MySQL cdc can lose/skip data during recovering from the checkpoint [flink-cdc]

via GitHub Wed, 06 Aug 2025 14:03:43 -0700


mielientiev commented on code in PR #3845:
URL: https://github.com/apache/flink-cdc/pull/3845#discussion_r2258292595



##########
flink-cdc-connect/flink-cdc-source-connectors/flink-connector-mysql-cdc/src/main/java/io/debezium/connector/mysql/GtidUtils.java:
##########
@@ -36,36 +38,53 @@ public class GtidUtils {
     public static GtidSet fixRestoredGtidSet(GtidSet serverGtidSet, GtidSet 
restoredGtidSet) {
         Map<String, GtidSet.UUIDSet> newSet = new HashMap<>();
         serverGtidSet.getUUIDSets().forEach(uuidSet -> 
newSet.put(uuidSet.getUUID(), uuidSet));
-        for (GtidSet.UUIDSet uuidSet : restoredGtidSet.getUUIDSets()) {
-            GtidSet.UUIDSet serverUuidSet = newSet.get(uuidSet.getUUID());
+        for (GtidSet.UUIDSet restoredUuidSet : restoredGtidSet.getUUIDSets()) {

Review Comment:
   @lzshlzsh Thanks for the feedback
   
   It's actually a very valid point. I think it may happen if you have a pool 
of MySQL hosts, and some replication delay. And imagine that your pipeline got 
restarted and reconnected to "lagging" host that haven't yet applied processed 
transactions, that your pipeline "saw" in previous run
   I implemented intersection logic with serverId range
   
   Also I preserved original test logic like this `A:1-100`. Indeed, it looks 
simpler.
   
   Please review it again



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@flink.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] [FLINK-37065]: MySQL cdc can lose/skip data during recovering from the checkpoint [flink-cdc]

Reply via email to