Hi Himanshu If you want to try the flink + iceberg fo syncing mysql binlog to iceberg table, you might be interested in those PRs:
1. https://github.com/apache/iceberg/pull/2410 2. https://github.com/apache/iceberg/pull/2303 On Wed, Mar 24, 2021 at 10:34 AM OpenInx <open...@gmail.com> wrote: > Hi Himanshu > > Thanks for the email, currently we flink+iceberg support writing CDC > events into apache iceberg table by flink datastream API, besides the > spark/presto/hive could read those events in batch job. > > But there are still some issues that we do not finish yet: > > 1. Expose the iceberg v2 to end users. The row-level delete feature is > actually built on the iceberg format v2, there are still some blockers > that we need to fix (pls see the document > https://docs.google.com/document/d/1FyLJyvzcZbfbjwDMEZd6Dj-LYCfrzK1zC-Bkb3OiICc/edit), > we iceberg team will need some resources to resolve them. > 2. As we know the CDC events depend on iceberg primary key > identification (Then we could define mysql_cdc sql table by using primary > key cause) I saw Jack Ye has published a PR to this > https://github.com/apache/iceberg/pull/2354, I will review it today. > 3. The CDC writers will produce many small files inevitably as the > periodic checkpoints go on, so for the real production env we must provide > the ability to rewrite small files into larger files ( compaction action) > . There are few PRs needing to be reviewing: > a. https://github.com/apache/iceberg/pull/2303/files > b. https://github.com/apache/iceberg/pull/2294 > c. https://github.com/apache/iceberg/pull/2216 > > I think it's better to resolve all those issues before we put the > production data into iceberg ( syncing mysql binlog via debezium). I saw > the last sync notes saying the next release 0.12.0 would be released in > end of this month ideally ( > https://lists.apache.org/x/thread.html/rdb7d1ab221295adec33cf93dcbcac2b9b7b80708b2efd903b7105511@%3Cdev.iceberg.apache.org%3E) > , I think that that deadline is too tight. In my mind, if the release > 0.12.0 won't expose the format v2 to end users, then what are the core > features that we want to release ? If the features that we plan to release > are not major ones, then how about releasing the 0.11.2 ? > > According to my understanding of the needs of community users, the vast > majority of iceberg users have high expectations for format v2. I think we > may need to raise the v2 exposure to a higher priority so that our users > can do the whole PoC tests earlier. > > > > On Wed, Mar 24, 2021 at 3:49 AM Himanshu Rathore > <himanshu.rath...@zomato.com.invalid> wrote: > >> We are planning for use Flink + Iceberg for syncing mysql binlog's via >> debezium and its seams of things are dependent on next release. >> >