[jira] [Commented] (FLINK-36611) Add schema info to output of Kafka sink

Andrew Otto (Jira) Mon, 28 Oct 2024 07:01:47 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-36611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17893468#comment-17893468
 ]


Andrew Otto commented on FLINK-36611:
-------------------------------------

This will be very useful!

 
Having Kafka in the middle has some advantages.
 
- A distributed log buffer: MariaDB read and downstream (e.g. Paimon table) 
write jobs are decoupled.
- CDC data in Kafka means it can be re-used for things. We can sink it to 
Paimon or Iceberg tables, and we could use kafka + debezium json source to 
implement other streaming jobs.
 
Thank you!
 
See our use case: [T373144 [SPIKE] Learn and document how to use Flink-CDC from 
MediaWiki MariaDB locally](https://phabricator.wikimedia.org/T373144)

> Add schema info to output of Kafka sink  
> -----------------------------------------
>
>                 Key: FLINK-36611
>                 URL: https://issues.apache.org/jira/browse/FLINK-36611
>             Project: Flink
>          Issue Type: New Feature
>          Components: Flink CDC
>    Affects Versions: cdc-3.3.0
>            Reporter: LvYanquan
>            Priority: Major
>             Fix For: cdc-3.3.0
>
>
> Currently, the output of Kafka sink in debezium format looks like this:
> {code:java}
> {
>   "before": {
>     "id": 4,
>     "name": "John",
>     "address": "New York",
>     "phone_number": "2222",
>     "age": 12
>   },
>   "after": {
>     "id": 4,
>     "name": "John",
>     "address": "New York",
>     "phone_number": "1234",
>     "age": 12
>   },
>   "op": "u",
>   "source": {
>     "db": null,
>     "table": "customers"
>   }
> } {code}
> It contains record data with full before/after and db info, but schema info 
> wasn't included. 
> However, In some scenarios, we need this information to determine the type of 
> data. For example, Paimon's Kafka CDC source requires this type information, 
> otherwise all types are considered String, refer to 
> [https://paimon.apache.org/docs/0.9/flink/cdc-ingestion/kafka-cdc/#supported-formats.]
> Considering that this will increase the data load, I suggest adding a 
> parameter to configure whether to enable it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-36611) Add schema info to output of Kafka sink

Reply via email to