Wenkai Qi created FLINK-36931: --------------------------------- Summary: FlinkCDC YAML supports synchronizing the full amount of data of the entire database in Batch mode Key: FLINK-36931 URL: https://issues.apache.org/jira/browse/FLINK-36931 Project: Flink Issue Type: New Feature Components: Flink CDC Reporter: Wenkai Qi
h1. Background MysqlCDC in Flink CDC supports *StartupMode.SNAPSHOT* and is of {*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}. h1. Expectation FlinkCDC YAML jobs can support {*}StartupMode.SNAPSHOT{*}, be of {*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}. h1. Benefits # The performance improvement of Flink Batch can be utilized (dynamic partition pruning, Hybrid Shuffle). Which optimizations of the batch mode will be used needs to be discussed. # The full amount of data of the entire database can be synchronized to supplement data in an offline computing manner. In the future, it can even support the full amount of data synchronization of the entire database for other databases and data lakes. h1. Under consideration # Sink needs to switch to Batch mode. [https://github.com/apache/flink-cdc/pull/3646#pullrequestreview-2491309306] # Call a checkpoint with checkpointid of Long.MAX_VALUE once, and the sink should make the final submission based on this id. # Sink directly supports Batch writing (such as DorisSink) # ...(In supplementation) -- This message was sent by Atlassian Jira (v8.20.10#820010)