[jira] [Updated] (FLINK-36683) Support metadata 'row_kind' virtual column for Mongo CDC Connector

Runkang He (Jira) Sat, 09 Nov 2024 00:03:05 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-36683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Runkang He updated FLINK-36683:
-------------------------------
    Description: 
'row_kind' metadata is very useful in actual user scenarios, the two main 
scenarios are below:

1. Save all upstream messages: In this scenario, the downstream will save all 
message includes delete messages from upstream. To achieve this requirement, we 
should convert full changelogs to append only message, and need to use metadata 
row_kind to represent the changelog kind.

2. Ignore upstream delete messages: In this scenario, the upstream cdc source 
often deletes historical data regularly to save storage space and only retains 
data within seven days. However, the business requires the downstream OLAP 
system to retain the full amount of historical data, so it is necessary to 
ignore the delete messages from source.

So I think we should support 'row_kind' metadata in Mongo CDC Connector.

  was:
'row_kind' metadata is very useful in actual user scenarios, the two main 
scenarios are below:

1. Save all upstream messages: In this scenario, the downstream will save all 
message includes delete messages from upstream. To achieve this requirement, we 
should convert all kind of changelogs to append only message, and need to use 
metadata row_kind to represent the changelog kind.

2. Ignore upstream delete messages: In this scenario, the upstream cdc source 
often deletes historical data regularly to save storage space and only retains 
data within seven days. However, the business requires the downstream OLAP 
system to retain the full amount of historical data, so it is necessary to 
ignore the delete messages from source.

So I think we should support 'row_kind' metadata in Mongo CDC Connector.


> Support metadata 'row_kind' virtual column for Mongo CDC Connector
> ------------------------------------------------------------------
>
>                 Key: FLINK-36683
>                 URL: https://issues.apache.org/jira/browse/FLINK-36683
>             Project: Flink
>          Issue Type: Improvement
>          Components: Flink CDC
>    Affects Versions: cdc-3.3.0, cdc-3.2.1
>            Reporter: Runkang He
>            Priority: Major
>
> 'row_kind' metadata is very useful in actual user scenarios, the two main 
> scenarios are below:
> 1. Save all upstream messages: In this scenario, the downstream will save all 
> message includes delete messages from upstream. To achieve this requirement, 
> we should convert full changelogs to append only message, and need to use 
> metadata row_kind to represent the changelog kind.
> 2. Ignore upstream delete messages: In this scenario, the upstream cdc source 
> often deletes historical data regularly to save storage space and only 
> retains data within seven days. However, the business requires the downstream 
> OLAP system to retain the full amount of historical data, so it is necessary 
> to ignore the delete messages from source.
> So I think we should support 'row_kind' metadata in Mongo CDC Connector.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-36683) Support metadata 'row_kind' virtual column for Mongo CDC Connector

Reply via email to