[ 
https://issues.apache.org/jira/browse/FLINK-36750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yanquan Lv updated FLINK-36750:
-------------------------------
    Description: 
When schema evolution happened, we will prepare commit and recreate a new 
FileStoreWrite to obtain the latest schema. However, FileStoreWrite maintain 
some information like sequence number in memory, we can't directly remove and 
recreate one FileStoreWrite, instead, we should extract the information of 
Write and rebuild with this information.
The  sequence number is used to determine the order of data with two identical 
primary keys, If we don't strictly maintain this order, it may lead to 
unexpected situations.

The following picture show The problem we are currently facing:
1) Schema evolution happened between the second and third 
files(`{*}schema_id{*}` changed)
!image-2024-11-20-13-04-53-635.png!

2)The expected sequence number here should be increasing, however, there is an 
overlap of `{*}min_sequence_number{*}` between the third file and the second 
file.
!image-2024-11-20-13-02-47-612.png!

Due to the confusion of sequence numbers, we may read the data of update-before.

 

  was:
When schema evolution happened, we will prepare commit and recreate a new 
FileStoreWrite to obtain the latest schema. However, FileStoreWrite maintain 
some information like sequence number in memory, we can't directly remove and 
recreate one FileStoreWrite, instead, we should extract the information of 
Write and rebuild with this information.
The  sequence number is used to determine the order of data with two identical 
primary keys, If we don't strictly maintain this order, it may lead to 
unexpected situations.

The following picture show The problem we are currently facing:
1) Schema evolution happened between the second and third files(schema_id 
changed)
!image-2024-11-20-13-04-53-635.png!

2)The expected sequence number here should be increasing, however, there is an 
overlap of min_sequence_number between the third file and the second file.
!image-2024-11-20-13-02-47-612.png!

Due to the confusion of sequence numbers, we may read the data of update-before.

 


> Paimon connector would reuse sequence number when schema evolution happened
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-36750
>                 URL: https://issues.apache.org/jira/browse/FLINK-36750
>             Project: Flink
>          Issue Type: Improvement
>          Components: Flink CDC
>    Affects Versions: cdc-3.2.0
>            Reporter: Yanquan Lv
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: cdc-3.2.1
>
>         Attachments: image-2024-11-20-13-00-58-282.png, 
> image-2024-11-20-13-02-47-612.png, image-2024-11-20-13-04-53-635.png
>
>
> When schema evolution happened, we will prepare commit and recreate a new 
> FileStoreWrite to obtain the latest schema. However, FileStoreWrite maintain 
> some information like sequence number in memory, we can't directly remove and 
> recreate one FileStoreWrite, instead, we should extract the information of 
> Write and rebuild with this information.
> The  sequence number is used to determine the order of data with two 
> identical primary keys, If we don't strictly maintain this order, it may lead 
> to unexpected situations.
> The following picture show The problem we are currently facing:
> 1) Schema evolution happened between the second and third 
> files(`{*}schema_id{*}` changed)
> !image-2024-11-20-13-04-53-635.png!
> 2)The expected sequence number here should be increasing, however, there is 
> an overlap of `{*}min_sequence_number{*}` between the third file and the 
> second file.
> !image-2024-11-20-13-02-47-612.png!
> Due to the confusion of sequence numbers, we may read the data of 
> update-before.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to