[jira] [Updated] (FLINK-36931) FlinkCDC YAML supports synchronizing the full amount of data of the entire database in Batch mode

ASF GitHub Bot (Jira) Mon, 23 Dec 2024 07:58:04 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-36931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ASF GitHub Bot updated FLINK-36931:
-----------------------------------
    Labels: pull-request-available  (was: )

> FlinkCDC YAML supports synchronizing the full amount of data of the entire 
> database in Batch mode
> -------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-36931
>                 URL: https://issues.apache.org/jira/browse/FLINK-36931
>             Project: Flink
>          Issue Type: New Feature
>          Components: Flink CDC
>            Reporter: Wenkai Qi
>            Assignee: Wenkai Qi
>            Priority: Major
>              Labels: pull-request-available
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> h1. Background
> MysqlCDC in Flink CDC supports *StartupMode.SNAPSHOT* and is of 
> {*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
> h1.  
> Expectation
> FlinkCDC YAML jobs can support {*}StartupMode.SNAPSHOT{*}, be of 
> {*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
> h1. Benefits
>  
>  # The performance improvement of Flink Batch can be utilized (dynamic 
> partition pruning, Hybrid Shuffle). Which optimizations of the batch mode 
> will be used needs to be discussed.
>  # The full amount of data of the entire database can be synchronized to 
> supplement data in an offline computing manner. In the future, it can even 
> support the full amount of data synchronization of the entire database for 
> other databases and data lakes.
> h1. Under consideration
>  
>  # Sink needs to switch to Batch mode. 
> [https://github.com/apache/flink-cdc/pull/3646#pullrequestreview-2491309306]
>  # For 2PC sink, call a checkpoint with checkpointid of Long.MAX_VALUE once, 
> and the sink should make the final submission based on this id.
>  # Sink directly supports Batch writing (such as DorisSink)
>  # ...(In supplementation)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-36931) FlinkCDC YAML supports synchronizing the full amount of data of the entire database in Batch mode

Reply via email to