[jira] [Updated] (FLINK-35600) Data read duplication during the full-to-incremental conversion phase

ASF GitHub Bot (Jira) Thu, 13 Jun 2024 22:36:05 -0700


     [ 
https://issues.apache.org/jira/browse/FLINK-35600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated FLINK-35600:
-----------------------------------
    Labels: pull-request-available  (was: )

> Data read duplication during the full-to-incremental conversion phase
> ---------------------------------------------------------------------
>
>                 Key: FLINK-35600
>                 URL: https://issues.apache.org/jira/browse/FLINK-35600
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>    Affects Versions: cdc-3.1.0
>            Reporter: Di Wu
>            Priority: Major
>              Labels: pull-request-available
>
> Assume that the table has been split into 3 Chunks
> Timeline
> t1: chunk1 is read
> t2: a piece of data A belonging to chunk2 is inserted in MySQL
> t3: chunk2 is read, and data A has been sent downstream
> t4: chunk3 is read
> At this time, startOffset will be set to lowwatermark
> t5: BinlogSplitReader.pollSplitRecords receives data A, and uses the method 
> shouldEmit to determine whether the data is sent downstream
> In this method
> {code:java}
> private boolean hasEnterPureBinlogPhase(TableId tableId, BinlogOffset 
> position) {
>     if (pureBinlogPhaseTables.contains(tableId)) {
>         return true;
>     }
>     // the existed tables those have finished snapshot reading
>     if (maxSplitHighWatermarkMap.containsKey(tableId)
>             && position.isAtOrAfter(maxSplitHighWatermarkMap.get(tableId))) {
>         pureBinlogPhaseTables.add(tableId);
>         return true;
>     }
> } {code}
> *maxSplitHighWatermarkMap.get(tableId)* obtains the HighWatermark data 
> without ts_sec variable, and the default value is 0
> *position.isAtOrAfter(maxSplitHighWatermarkMap.get(tableId))*
> So this expression is judged as true
> *Data A continues to be sent downstream, and the data is repeated*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-35600) Data read duplication during the full-to-incremental conversion phase

Reply via email to