[jira] [Updated] (FLINK-36701) Pipeline failover again after handling a schema change event as the first event after a failover

Leonard Xu (Jira) Wed, 08 Jan 2025 22:46:06 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-36701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Leonard Xu updated FLINK-36701:
-------------------------------
    Fix Version/s: cdc-3.3.0

> Pipeline failover again after handling a schema change event as the first 
> event after a failover
> ------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-36701
>                 URL: https://issues.apache.org/jira/browse/FLINK-36701
>             Project: Flink
>          Issue Type: Bug
>          Components: Flink CDC
>    Affects Versions: cdc-3.2.0, cdc-3.3.0
>            Reporter: zjjiang
>            Assignee: zjjiang
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: cdc-3.3.0
>
>         Attachments: image-2024-11-13-16-29-30-065.png
>
>
> Currently, directly after a failover, when the pipeline first handles a 
> schema change event (e.g. addColumnEvent) and then a DataChangeEvent, it may 
> cause the job to fail again as sink has repeatedly applied that schema change.
> The cause of the problem can be explained as follows:
> 1. SinkWriterOperator now requests the latest schema when it receives the 
> first non-createTableEvent schema change event (assuming there is no schema 
> in the local cache).
> 2. The schema manager applies the schema change after confirming flush 
> success.
> 3. Assume that the sequence after failover is to process a schema change 
> event first, followed by a data change event.
> On the schema manager side, the schema manager will apply the schema change 
> event to its cached schema(i.e. evolvedSchema) after confirming a successful 
> flush.
> On the SinkWriterOperator side, the processing flow is:
> 1) Handle the flushEvent;
> 2) Handle the schema change event (in this step, the latest schema will be 
> fetched from the schema manager and sent downstream; then the schema change 
> event will be emitted) – note that this step does not report an error;
> 3) Handle the data change – here the failover occurs because the data record 
> column size does not match the schema.
> !image-2024-11-13-16-29-30-065.png|width=1436,height=960!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-36701) Pipeline failover again after handling a schema change event as the first event after a failover

Reply via email to