[ 
https://issues.apache.org/jira/browse/FLINK-36931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenkai Qi updated FLINK-36931:
------------------------------
    Description: 
h1. Background

MysqlCDC in Flink CDC supports *StartupMode.SNAPSHOT* and is of 
{*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
h1.  

Expectation

FlinkCDC YAML jobs can support {*}StartupMode.SNAPSHOT{*}, be of 
{*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
h1. Benefits

 
 # The performance improvement of Flink Batch can be utilized (dynamic 
partition pruning, Hybrid Shuffle). Which optimizations of the batch mode will 
be used needs to be discussed.
 # The full amount of data of the entire database can be synchronized to 
supplement data in an offline computing manner. In the future, it can even 
support the full amount of data synchronization of the entire database for 
other databases and data lakes.

h1. Under consideration

 
 # Sink needs to switch to Batch mode. 
[https://github.com/apache/flink-cdc/pull/3646#pullrequestreview-2491309306]
 # For 2PC sink, call a checkpoint with checkpointid of Long.MAX_VALUE once, 
and the sink should make the final submission based on this id.
 # Sink directly supports Batch writing (such as DorisSink)
 # ...(In supplementation)

  was:
h1. Background

MysqlCDC in Flink CDC supports *StartupMode.SNAPSHOT* and is of 
{*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
h1. 
Expectation

FlinkCDC YAML jobs can support {*}StartupMode.SNAPSHOT{*}, be of 
{*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
h1. Benefits

 
 # The performance improvement of Flink Batch can be utilized (dynamic 
partition pruning, Hybrid Shuffle). Which optimizations of the batch mode will 
be used needs to be discussed.
 # The full amount of data of the entire database can be synchronized to 
supplement data in an offline computing manner. In the future, it can even 
support the full amount of data synchronization of the entire database for 
other databases and data lakes.

h1. Under consideration

 
 # Sink needs to switch to Batch mode. 
[https://github.com/apache/flink-cdc/pull/3646#pullrequestreview-2491309306]
 # Call a checkpoint with checkpointid of Long.MAX_VALUE once, and the sink 
should make the final submission based on this id.
 # Sink directly supports Batch writing (such as DorisSink)
 # ...(In supplementation)


> FlinkCDC YAML supports synchronizing the full amount of data of the entire 
> database in Batch mode
> -------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-36931
>                 URL: https://issues.apache.org/jira/browse/FLINK-36931
>             Project: Flink
>          Issue Type: New Feature
>          Components: Flink CDC
>            Reporter: Wenkai Qi
>            Priority: Major
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> h1. Background
> MysqlCDC in Flink CDC supports *StartupMode.SNAPSHOT* and is of 
> {*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
> h1.  
> Expectation
> FlinkCDC YAML jobs can support {*}StartupMode.SNAPSHOT{*}, be of 
> {*}Boundedness.BOUNDED{*}, and can run in {*}RuntimeExecutionMode.BATCH{*}.
> h1. Benefits
>  
>  # The performance improvement of Flink Batch can be utilized (dynamic 
> partition pruning, Hybrid Shuffle). Which optimizations of the batch mode 
> will be used needs to be discussed.
>  # The full amount of data of the entire database can be synchronized to 
> supplement data in an offline computing manner. In the future, it can even 
> support the full amount of data synchronization of the entire database for 
> other databases and data lakes.
> h1. Under consideration
>  
>  # Sink needs to switch to Batch mode. 
> [https://github.com/apache/flink-cdc/pull/3646#pullrequestreview-2491309306]
>  # For 2PC sink, call a checkpoint with checkpointid of Long.MAX_VALUE once, 
> and the sink should make the final submission based on this id.
>  # Sink directly supports Batch writing (such as DorisSink)
>  # ...(In supplementation)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to