[ https://issues.apache.org/jira/browse/FLINK-36750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17899699#comment-17899699 ]
Leonard Xu commented on FLINK-36750: ------------------------------------ master:3a2d799fd8eceea76b0abfb066e70b2ca4d58648 3.2: fb2a1d0c38fb34fac028de41bee25b4d5fb39989 > Paimon connector would reuse sequence number when schema evolution happened > --------------------------------------------------------------------------- > > Key: FLINK-36750 > URL: https://issues.apache.org/jira/browse/FLINK-36750 > Project: Flink > Issue Type: Improvement > Components: Flink CDC > Affects Versions: cdc-3.2.0 > Reporter: Yanquan Lv > Assignee: Yanquan Lv > Priority: Major > Labels: pull-request-available > Fix For: cdc-3.2.1 > > Attachments: image-2024-11-20-13-00-58-282.png, > image-2024-11-20-13-02-47-612.png, image-2024-11-20-13-04-53-635.png > > > When schema evolution happened, we will prepare commit and recreate a new > FileStoreWrite to obtain the latest schema. However, FileStoreWrite maintain > some information like sequence number in memory, we can't directly remove and > recreate one FileStoreWrite, instead, we should extract the information of > Write and rebuild with this information. > The sequence number is used to determine the order of data with two > identical primary keys, If we don't strictly maintain this order, it may lead > to unexpected situations. > The following picture show The problem we are currently facing: > 1) Schema evolution happened between the second and third > files(`{*}schema_id{*}` changed) > !image-2024-11-20-13-04-53-635.png! > 2)The expected sequence number here should be increasing, however, there is > an overlap of `{*}min_sequence_number{*}` between the third file and the > second file. > !image-2024-11-20-13-02-47-612.png! > Due to the confusion of sequence numbers, we may read the data of > update-before. > -- This message was sent by Atlassian Jira (v8.20.10#820010)