Hi Yanquan, We need upgrade Paimon to 1.0... Iceberg snapshots in Paimon 0.9 cannot work well.
Best, Jingsong On Mon, Dec 9, 2024 at 5:25 PM Yanquan Lv <decq12y...@gmail.com> wrote: > > Hi, John. Happy to hear that you've used this pipeline. > > As you said, Paimon's Iceberg Compatibility Mode is a new feature in Paimon > v0.9. However, the Paimon version that CDC module bumped is still v0.8, so we > will need to wait for bumping to V0.9 to support this. I’ve already created a > pr for this version bump, maybe you can help to review or verify on it. and > we plan to complete this in FlinkCDC V3.3.0. > > [1] https://github.com/apache/flink-cdc/pull/3644 > > > > > 2024年12月8日 15:15,John Mwangi <john.mwa...@live.com> 写道: > > Hi Yanquan, > > Thanks for your response - we've been able to make a lot of progress on this > project by switching to Debezium CDC for the time being. > > We're also experimenting with Paimon's Iceberg Compatibility Mode in v0.9 and > noticed the following: > > DESCRIBE iceberg_catalog.`users_ta_ice.db`.user_2; (works as expected) > SELECT * FROM iceberg_catalog.`users_ta_ice.db`.user_2; (throws > "TableNotExistException") > SHOW DATABASES IN iceberg_catalog; (database names have a .db extension) > > Not sure if the database names having a .db extension this contributes to the > exception. > > Is this an issue that you've come across? > > For context, this is the link to our setup on GitHub. > > Regards, > John Mwangi > ________________________________ > From: Yanquan Lv <decq12y...@gmail.com> > Sent: 28 October 2024 13:22 > To: Andrew Otto <o...@wikimedia.org> > Cc: User <user@flink.apache.org>; u...@paimon.apache.org > <u...@paimon.apache.org>; John Mwangi <john.mwa...@live.com> > Subject: Re: Flink CDC -> Kafka -> Paimon? > > Hi, Andrew. > Yeah, currently, the output from Kafka pipeline didn't contain schema info, > So in Paimon action, it will be considered as a String type. > Your suggestion is very meaningful. I plan to support this feature in the > next version of FlinkCDC (FlinkCDC 3.3), which may be enabled through a > parameter[1]. > > And Paimon sink is already available in FlinkCDC, so we can directly write > data from MariaDB to Paimon to reduce the number of components and links that > need to be maintained, We will also follow up on any issues encountered in > Paimon Pipeline sink. > > [1] https://issues.apache.org/jira/browse/FLINK-36611 > > > 2024年10月26日 11:32,Andrew Otto <o...@wikimedia.org> 写道: > > Hi! > > I really like Flink CDC's pipeline connectors! So simple! > I also like Paimon's CDC ingestion action CLI. > > I like these because I don't need to specify the schemas; they are inferred > from the source. I also like the schema evolution support! > > Paimon's recent Iceberg Compatibility mode looks cool! > > I'd like to accomplish the following: > > MariaDB CDC -> Kafka > Kafka -> Paimon > Query with Spark+Iceberg > > I can do MariaDB CDC -> Kafka with the flink-cdc pipeline connector. > However, I'm stuck on Kafka -> Paimon. > > Paimon's Kafka CDC action docs say: > > Usually, debezium-json contains ‘schema’ field, from which Paimon will > > retrieve data types. Make sure your debezium json has this field, or Paimon > > will use ‘STRING’ type. > > However, the messages generated by flink-cdc's pipeline connector do not have > a schema field. They look like: > > { > "before": null, > "after": { > "rev_id": 37, > ... > }, > "op": "c" > } > > The only reference I can find to a schema field is in the debezium-json > format documentation. > > users may setup the Debezium Kafka Connect with the Kafka configuration > > 'value.converter.schemas.enable' enabled to include schema in the message > > This leads me to believe that the schema field that Paimon's doc is referring > to is added not by debezium or flink-cdc, but by Kafka Connect when it is > used with Debezium proper to write CDC messages to Kafka. > > Does this mean that the CDC messages generated by the flink-cdc kafka > pipeline connector are not compatible with Paimon's kafka_sync_database > action? > Or, is there a way to cause flink-cdc pipeline connectors to include schema > in the message? > > > I could be misunderstanding something. Is Kafka Connect+Debezium used by > Flink to support debezium-json formatted messages? I tried passing > properties.value.converter.schemas.enable: true to the flink-cdc pipeline > kafka sink but that did not work (as expected). > > Thank you! > > -Andrew Otto > Wikimedia Foundation > > P.S. Context for what we are trying to do is here: T373144 [SPIKE] Learn and > document how to use Flink-CDC from MediaWiki MariaDB locally > >