Re: Flink CDC to Paimon

Taher Koitawala Thu, 06 Mar 2025 20:15:57 -0800

Hi Leonard,
     Yes i did see Xianqian’s reply however i thought my email did not go
through as the community is often very active but I did not receive a
response until Xianqian’s reply.


Thank you Xianqian we are currently trying out the suggestions for altering
schema via the catalog.

Thank you Andrew Otto for the docs as i will I will take a look at this.




On Wed, 5 Mar 2025 at 5:47 AM, Leonard Xu <xbjt...@gmail.com> wrote:

> Hey Taher,
>
> Xianqian has replied your email, did you subscribe Flink dev mailing-list?
>
> Best,
> Leonard
>
> > 2025年2月11日 16:52，Xiqian YU <kono....@outlook.com> 写道：
> >
> > Hi Taher,
> >
> > Since we’re creating a DataStream-based pipeline job with SQL Server
> CDC, schema change events must be handled manually. A possible approach
> would be:
> >
> > 1) Enable schema change events with `.includeSchemaChanges(true)`
> option, so DDL events will be parsed and encoded in `SourceRecord`s.
> >
> > 2) Write a customized `DebeziumDeserializationSchema` class and parse
> schema change events.
> `MySqlEventDeserializer#deserializeSchemaChangeRecord` could be used as a
> reference [1].
> >
> > 3) Evolve sink schema of Paimon tables with `PaimonCatalog` manually.
> `PaimonMetadataApplier` [2] is an existing schema evolving implementation
> supporting a few frequently used schema change events.
> >
> > Also, CDC Pipeline framework [3] has provided a fully-automatic schema
> sensing and evolving solution, but unfortunately Microsoft SQL Server
> source is not supported yet until we close #3445 [4] or #3507 [5].
> >
> > [1]
> https://github.com/apache/flink-cdc/blob/master/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-mysql/src/main/java/org/apache/flink/cdc/connectors/mysql/source/MySqlEventDeserializer.java
> > [2]
> https://github.com/apache/flink-cdc/blob/master/flink-cdc-connect/flink-cdc-pipeline-connectors/flink-cdc-pipeline-connector-paimon/src/main/java/org/apache/flink/cdc/connectors/paimon/sink/PaimonMetadataApplier.java
> > [3]
> https://nightlies.apache.org/flink/flink-cdc-docs-release-3.3/docs/core-concept/data-pipeline/
> > [4] https://github.com/apache/flink-cdc/pull/3445
> > [5] https://github.com/apache/flink-cdc/pull/3507
> >
> > Best Regards,
> > Xiqian
> >
> > Taher Koitawala <taher...@gmail.com> 於 2025年2月11日 15:59 寫道：
> >
> > Hi Devs,
> >      As a POC we are trying to create a steaming pipeline from MSSQL cdc
> > to Paimon:
> >
> > To do this we are doing
> > 1. msSql server cdc operator
> > 2. Transform operator
> > 3. paimon sink
> >
> > We have written the cdc connector with is a
> JsonDebeziumDeserialisedSchema
> > String
> >
> > I wish to write this paimon in a table format with same columns as
> source.
> >
> > As far as i know paimon automatically handles schema updates like new
> field
> > additions.
> >
> > Please can someone point me on how to write this stream efficiently to
> > paimon table with schema updates?
> >
> > For now i have SouceFunction<String>
> >
> > Which is the record mentioned above!
> >
> > Regards,
> > Taher Koitawala
> >
>
>

Re: Flink CDC to Paimon

Reply via email to