Hi openinx,

With https://github.com/apache/iceberg/pull/2303 and a potential sequence 
number based fix for https://github.com/apache/iceberg/issues/2308, I don't see 
a harder blocker to test out row-level deletions. Please correct if anything 
else in https://github.com/apache/iceberg/milestone/4 is a must have.

Is it possible to separate flink+iceberg CDC changes and low-level deletions in 
future releases so that the community can have V2 earlier?

Thanks,
Huadong

On 2021/03/24 02:34:23, OpenInx <open...@gmail.com> wrote: 
> Hi Himanshu
> 
> Thanks for the email,  currently we flink+iceberg support writing CDC
> events into apache iceberg table by flink datastream API, besides the
> spark/presto/hive could read those events in batch job.
> 
> But there are still some issues that we do not finish yet:
> 
> 1.  Expose the iceberg v2 to end users.  The row-level delete feature is
> actually built on the iceberg format v2,  there are still some blockers
> that we need to fix (pls see the document
> https://docs.google.com/document/d/1FyLJyvzcZbfbjwDMEZd6Dj-LYCfrzK1zC-Bkb3OiICc/edit),
> we iceberg team will need some resources to resolve them.
> 2.  As we know the CDC events depend on iceberg primary key identification
> (Then we could define mysql_cdc sql table by using primary key cause) I saw
> Jack Ye has published a PR to this
> https://github.com/apache/iceberg/pull/2354,  I will review it today.
> 3.  The CDC writers will produce many small files inevitably as the
> periodic checkpoints go on,  so for the real production env we must provide
> the ability to rewrite small files into larger files ( compaction action)
> .  There are few PRs needing to be reviewing:
>        a.  https://github.com/apache/iceberg/pull/2303/files
>        b.  https://github.com/apache/iceberg/pull/2294
>        c.  https://github.com/apache/iceberg/pull/2216
> 
> I think it's better to resolve all those issues before we put the
> production data into iceberg ( syncing mysql binlog via debezium).  I saw
> the last sync notes saying  the next release 0.12.0 would be released in
> end of this month ideally (
> https://lists.apache.org/x/thread.html/rdb7d1ab221295adec33cf93dcbcac2b9b7b80708b2efd903b7105511@%3Cdev.iceberg.apache.org%3E)
> ,  I think that  that deadline is too tight.  In my mind,  if the release
> 0.12.0 won't expose the format v2 to end users, then what are the core
> features that we want to release ?  If the features that we plan to release
> are not major ones,  then how about releasing the 0.11.2 ?
> 
> According to my understanding of the needs of community users, the vast
> majority of iceberg users have high expectations for format v2. I think we
> may need to raise the v2 exposure to a higher priority so that our users
> can do the whole PoC tests earlier.
> 
> 
> 
> On Wed, Mar 24, 2021 at 3:49 AM Himanshu Rathore
> <himanshu.rath...@zomato.com.invalid> wrote:
> 
> > We are planning for use Flink + Iceberg for syncing mysql binlog's via
> > debezium and its seams of things are dependent on next release.
> >
> 

Reply via email to