[ https://issues.apache.org/jira/browse/FLINK-36931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated FLINK-36931: ----------------------------------- Labels: pull-request-available (was: ) > FlinkCDC YAML supports synchronizing the full amount of data of the entire > database in Batch mode > ------------------------------------------------------------------------------------------------- > > Key: FLINK-36931 > URL: https://issues.apache.org/jira/browse/FLINK-36931 > Project: Flink > Issue Type: New Feature > Components: Flink CDC > Reporter: Wenkai Qi > Assignee: Wenkai Qi > Priority: Major > Labels: pull-request-available > Original Estimate: 336h > Remaining Estimate: 336h > > h1. Background > MysqlCDC in Flink CDC supports *StartupMode.SNAPSHOT* and is of > {*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}. > h1. > Expectation > FlinkCDC YAML jobs can support {*}StartupMode.SNAPSHOT{*}, be of > {*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}. > h1. Benefits > > # The performance improvement of Flink Batch can be utilized (dynamic > partition pruning, Hybrid Shuffle). Which optimizations of the batch mode > will be used needs to be discussed. > # The full amount of data of the entire database can be synchronized to > supplement data in an offline computing manner. In the future, it can even > support the full amount of data synchronization of the entire database for > other databases and data lakes. > h1. Under consideration > > # Sink needs to switch to Batch mode. > [https://github.com/apache/flink-cdc/pull/3646#pullrequestreview-2491309306] > # For 2PC sink, call a checkpoint with checkpointid of Long.MAX_VALUE once, > and the sink should make the final submission based on this id. > # Sink directly supports Batch writing (such as DorisSink) > # ...(In supplementation) -- This message was sent by Atlassian Jira (v8.20.10#820010)