Paimon?

Jingsong Li Mon, 09 Dec 2024 02:07:12 -0800

Hi Yanquan,

We need upgrade Paimon to 1.0... Iceberg snapshots in Paimon 0.9
cannot work well.


Best,
Jingsong

On Mon, Dec 9, 2024 at 5:25 PM Yanquan Lv <decq12y...@gmail.com> wrote:
>
> Hi, John. Happy to hear that you've used this pipeline.
>
> As you said, Paimon's Iceberg Compatibility Mode is a new feature in Paimon 
> v0.9. However, the Paimon version that CDC module bumped is still v0.8, so we 
> will need to wait for bumping to V0.9 to support this. I’ve already created a 
> pr for this version bump, maybe you can help to review or verify on it. and 
> we plan to complete this in FlinkCDC V3.3.0.
>
> [1] https://github.com/apache/flink-cdc/pull/3644
>
>
>
>
> 2024年12月8日 15:15，John Mwangi <john.mwa...@live.com> 写道：
>
> Hi Yanquan,
>
> Thanks for your response - we've been able to make a lot of progress on this 
> project by switching to Debezium CDC for the time being.
>
> We're also experimenting with Paimon's Iceberg Compatibility Mode in v0.9 and 
> noticed the following:
>
> DESCRIBE iceberg_catalog.`users_ta_ice.db`.user_2;  (works as expected)
> SELECT * FROM iceberg_catalog.`users_ta_ice.db`.user_2;   (throws 
> "TableNotExistException")
> SHOW DATABASES IN iceberg_catalog;   (database names have a .db extension)
>
> Not sure if the database names having a .db extension this contributes to the 
> exception.
>
> Is this an issue that you've come across?
>
> For context, this is the link to our setup on GitHub.
>
> Regards,
> John Mwangi
> ________________________________
> From: Yanquan Lv <decq12y...@gmail.com>
> Sent: 28 October 2024 13:22
> To: Andrew Otto <o...@wikimedia.org>
> Cc: User <user@flink.apache.org>; u...@paimon.apache.org 
> <u...@paimon.apache.org>; John Mwangi <john.mwa...@live.com>
> Subject: Re: Flink CDC -> Kafka -> Paimon?
>
> Hi, Andrew.
> Yeah, currently, the output from Kafka pipeline didn't contain schema info, 
> So in Paimon action, it will be considered as a String type.
> Your suggestion is very meaningful. I plan to support this feature in the 
> next version of FlinkCDC (FlinkCDC 3.3), which may be enabled through a 
> parameter[1].
>
> And Paimon sink is already available in FlinkCDC, so we can directly write 
> data from MariaDB to Paimon to reduce the number of components and links that 
> need to be maintained, We will also follow up on any issues encountered in 
> Paimon Pipeline sink.
>
> [1] https://issues.apache.org/jira/browse/FLINK-36611
>
>
> 2024年10月26日 11:32，Andrew Otto <o...@wikimedia.org> 写道：
>
> Hi!
>
> I really like Flink CDC's pipeline connectors!  So simple!
> I also like Paimon's CDC ingestion action CLI.
>
> I like these because I don't need to specify the schemas; they are inferred 
> from the source.  I also like the schema evolution support!
>
> Paimon's recent Iceberg Compatibility mode looks cool!
>
> I'd like to accomplish the following:
>
> MariaDB CDC -> Kafka
> Kafka -> Paimon
> Query with Spark+Iceberg
>
> I can do MariaDB CDC -> Kafka with the flink-cdc pipeline connector.
> However, I'm stuck on Kafka -> Paimon.
>
> Paimon's Kafka CDC action docs say:
> > Usually, debezium-json contains ‘schema’ field, from which Paimon will 
> > retrieve data types. Make sure your debezium json has this field, or Paimon 
> > will use ‘STRING’ type.
>
> However, the messages generated by flink-cdc's pipeline connector do not have 
> a schema field.  They look like:
>
> {
>   "before": null,
>   "after": {
>     "rev_id": 37,
>      ...
>   },
>   "op": "c"
> }
>
> The only reference I can find to a schema field is in the debezium-json 
> format documentation.
> > users may setup the Debezium Kafka Connect with the Kafka configuration 
> > 'value.converter.schemas.enable' enabled to include schema in the message
>
> This leads me to believe that the schema field that Paimon's doc is referring 
> to is added not by debezium or flink-cdc, but by Kafka Connect when it is 
> used with Debezium proper to write CDC messages to Kafka.
>
> Does this mean that the CDC messages generated by the flink-cdc kafka 
> pipeline connector are not compatible with Paimon's kafka_sync_database 
> action?
> Or, is there a way to cause flink-cdc pipeline connectors to include schema 
> in the message?
>
>
> I could be misunderstanding something. Is Kafka Connect+Debezium used by 
> Flink to support debezium-json formatted messages? I tried passing 
> properties.value.converter.schemas.enable: true to the flink-cdc pipeline 
> kafka sink but that did not work (as expected).
>
> Thank you!
>
> -Andrew Otto
>  Wikimedia Foundation
>
> P.S. Context for what we are trying to do is here: T373144 [SPIKE] Learn and 
> document how to use Flink-CDC from MediaWiki MariaDB locally
>
>

Re: Flink CDC -> Kafka -> Paimon?

Reply via email to