[jira] [Created] (FLINK-36931) FlinkCDC YAML supports synchronizing the full amount of data of the entire database in Batch mode

Wenkai Qi (Jira) Wed, 18 Dec 2024 07:05:54 -0800

Wenkai Qi created FLINK-36931:
---------------------------------

             Summary: FlinkCDC YAML supports synchronizing the full amount of 
data of the entire database in Batch mode
                 Key: FLINK-36931
                 URL: https://issues.apache.org/jira/browse/FLINK-36931
             Project: Flink
          Issue Type: New Feature
          Components: Flink CDC
            Reporter: Wenkai Qi



h1. Background

MysqlCDC in Flink CDC supports *StartupMode.SNAPSHOT* and is of 
{*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
h1. 
Expectation

FlinkCDC YAML jobs can support {*}StartupMode.SNAPSHOT{*}, be of 
{*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
h1. Benefits

 
 # The performance improvement of Flink Batch can be utilized (dynamic 
partition pruning, Hybrid Shuffle). Which optimizations of the batch mode will 
be used needs to be discussed.
 # The full amount of data of the entire database can be synchronized to 
supplement data in an offline computing manner. In the future, it can even 
support the full amount of data synchronization of the entire database for 
other databases and data lakes.

h1. Under consideration

 
 # Sink needs to switch to Batch mode. 
[https://github.com/apache/flink-cdc/pull/3646#pullrequestreview-2491309306]
 # Call a checkpoint with checkpointid of Long.MAX_VALUE once, and the sink 
should make the final submission based on this id.
 # Sink directly supports Batch writing (such as DorisSink)
 # ...(In supplementation)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-36931) FlinkCDC YAML supports synchronizing the full amount of data of the entire database in Batch mode

Reply via email to