Hi Himanshu Thanks for the email, currently we flink+iceberg support writing CDC events into apache iceberg table by flink datastream API, besides the spark/presto/hive could read those events in batch job.
But there are still some issues that we do not finish yet: 1. Expose the iceberg v2 to end users. The row-level delete feature is actually built on the iceberg format v2, there are still some blockers that we need to fix (pls see the document https://docs.google.com/document/d/1FyLJyvzcZbfbjwDMEZd6Dj-LYCfrzK1zC-Bkb3OiICc/edit), we iceberg team will need some resources to resolve them. 2. As we know the CDC events depend on iceberg primary key identification (Then we could define mysql_cdc sql table by using primary key cause) I saw Jack Ye has published a PR to this https://github.com/apache/iceberg/pull/2354, I will review it today. 3. The CDC writers will produce many small files inevitably as the periodic checkpoints go on, so for the real production env we must provide the ability to rewrite small files into larger files ( compaction action) . There are few PRs needing to be reviewing: a. https://github.com/apache/iceberg/pull/2303/files b. https://github.com/apache/iceberg/pull/2294 c. https://github.com/apache/iceberg/pull/2216 I think it's better to resolve all those issues before we put the production data into iceberg ( syncing mysql binlog via debezium). I saw the last sync notes saying the next release 0.12.0 would be released in end of this month ideally ( https://lists.apache.org/x/thread.html/rdb7d1ab221295adec33cf93dcbcac2b9b7b80708b2efd903b7105511@%3Cdev.iceberg.apache.org%3E) , I think that that deadline is too tight. In my mind, if the release 0.12.0 won't expose the format v2 to end users, then what are the core features that we want to release ? If the features that we plan to release are not major ones, then how about releasing the 0.11.2 ? According to my understanding of the needs of community users, the vast majority of iceberg users have high expectations for format v2. I think we may need to raise the v2 exposure to a higher priority so that our users can do the whole PoC tests earlier. On Wed, Mar 24, 2021 at 3:49 AM Himanshu Rathore <himanshu.rath...@zomato.com.invalid> wrote: > We are planning for use Flink + Iceberg for syncing mysql binlog's via > debezium and its seams of things are dependent on next release. >