Re: Dynamic Iceberg Sink

Maximilian Michels Fri, 28 Feb 2025 03:29:52 -0800

Good news. There is now a pull request available:
https://github.com/apache/iceberg/pull/12424


-Max

On Thu, Jan 2, 2025 at 4:46 AM Anil Dasari
<[email protected]> wrote:
>
>  Hello Max, Peter, Happy new year.
> Could you share the repository, even if it's not fully complete, so I can 
> extend it further for my POC?
> Thanks    On Tuesday, December 3, 2024 at 02:01:46 AM PST, Anil Dasari 
> <[email protected]> wrote:
>
>   Hi Max,That’s great to hear—thank you for the update. I look forward to 
> reviewing the code when it’s available.
>
> Thanks,Anil
>     On Thursday, November 28, 2024 at 04:16:32 AM PST, Maximilian Michels 
> <[email protected]> wrote:
>
>  Hi Anil,
>
> Great. The Dynamic Iceberg Sink will address your feature request. We'll
> soon share the code for review.
>
> Thanks,
> Max
>
> On Wed, Nov 20, 2024 at 11:16 PM Anil Dasari <[email protected]>
> wrote:
>
> >  Hi Peter, is there a repository we can begin using for testing or
> > contributing? Thanks.
> >
> > +1 on the feature. i have raised similar request to iceberg community a
> > month ago - Add support for multiple table DataStream FlinkSink · Issue
> > #11436 · apache/iceberg
> >
> > |
> > |
> > |
> > |  |  |
> >
> >  |
> >
> >  |
> > |
> > |  |
> > Add support for multiple table DataStream FlinkSink · Issue #11436 · apa...
> >
> > Feature Request / Improvement We're currently using FlinkCDC to create an
> > Iceberg database from a source Postgre...
> >  |
> >
> >  |
> >
> >  |
> >
> >
> >
> > Thanks
> >    On Tuesday, November 19, 2024 at 01:12:38 AM PST, ConradJam <
> > [email protected]> wrote:
> >
> >  +1 I have been trying to complete the Flink CDC Iceberg pipeline recently,
> > and this feature has been helpful for my work
> >
> > Ferenc Csaky <[email protected]> 于2024年11月14日周四 21:16写道：
> >
> > > Hi Peter, Max,
> > >
> > > Thanks for starting this discussion, +1 for the proposal. I agree
> > > that the added functionality would solve some major pain points for
> > > Flink -> Iceberg use-cases!
> > >
> > > Best,
> > > Ferenc
> > >
> > >
> > >
> > >
> > > On Thursday, November 14th, 2024 at 03:54, Congxian Qiu <
> > > [email protected]> wrote:
> > >
> > > >
> > > >
> > > > Thanks for kicking off this discussion, +1 for this proposal.
> > > >
> > > > Currently, we use FlinkCDC as the ingestion tool to write to Iceberg,
> > and
> > > > need to restart the job if there is any schema evolution, Look forward
> > to
> > > > seeing this integrated with FlinkCDC
> > > >
> > > > Best,
> > > > Congxian
> > > >
> > > >
> > > > Leonard Xu [email protected] 于2024年11月14日周四 09:37写道：
> > > >
> > > > > Thanks Peter and Max for kicking off this discussion, +1 for this
> > > proposal.
> > > > >
> > > > > Dynamic Sink is a key improvement for Iceberg Sink, and it can be
> > well
> > > > > integrated with Flink CDC’s pipeline sink in the future as well.
> > > > >
> > > > > Best,
> > > > > Leonard
> > > > >
> > > > > > 2024年11月13日 下午7:57，Maximilian Michels [email protected] 写道：
> > > > > >
> > > > > > Thanks for driving this Peter! The Dynamic Iceberg Sink is meant to
> > > solve
> > > > > > several pain points of the current Flink Iceberg sink. Note that
> > the
> > > > > > Iceberg sink lives in Apache Iceberg, not Apache Flink, although
> > > that may
> > > > > > change one day when the interfaces are stable enough.
> > > > > >
> > > > > > From the user perspective, there are three main advantages to using
> > > the
> > > > > > Dynamic Iceberg sink:
> > > > > >
> > > > > > 1. It supports writing to any number of tables (No more 1:1
> > > sink/topic
> > > > > > relationship).
> > > > > > 2. It can dynamically create and update tables based on the
> > > user-supplied
> > > > > > routing.
> > > > > > 3. It can dynamically update the schema and partitioning of tables
> > > based
> > > > > > on
> > > > > > the user-supplied specification.
> > > > > >
> > > > > > I'd say these features will have a big impact on Flink Iceberg
> > users.
> > > > > >
> > > > > > Cheers,
> > > > > > Max
> > > > > >
> > > > > > On Wed, Nov 13, 2024 at 12:27 PM Péter Váry <
> > > [email protected]
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Team, With Max Michels, we started to work on enhancing the
> > > current
> > > > > > > Iceberg Sink to allow inserting evolving records into a changing
> > > table.
> > > > > > >
> > > > > > > Created the design doc [1]
> > > > > > > Created the Iceberg proposal [2]
> > > > > > > Started the conversation on the Iceberg dev list [3]
> > > > > > >
> > > > > > > From the abstract: --------- Flink Iceberg connector sink is the
> > > tool to
> > > > > > > write data to an Iceberg table from a continuous Flink stream.
> > The
> > > > > > > current
> > > > > > > Sink implementations emphasize throughput over flexibility. The
> > > main
> > > > > > > limiting factor is that the Iceberg Flink Sink requires static
> > > table
> > > > > > > structure. The table, the schema, the partitioning specification
> > > need
> > > > > > > to be
> > > > > > > constant. If one of the previous things changes the Flink Job
> > > needs to
> > > > > > > be
> > > > > > > restarted. This allows using optimal record serialization and
> > good
> > > > > > > performance, but real life use-cases need to work around this
> > > limitation
> > > > > > > when the underlying table has changed. We need to provide a tool
> > to
> > > > > > > accommodate these changes. [..] The following typical use cases
> > are
> > > > > > > considered during this design: - Incoming Avro records schema
> > > changes
> > > > > > > (new
> > > > > > > columns are added, or other backward compatible changes happen).
> > > The
> > > > > > > Flink
> > > > > > > job is expected to update the table schema dynamically, and
> > > continue to
> > > > > > > ingest data with the new and the old schema without a job
> > restart.
> > > -
> > > > > > > Incoming records define the target Iceberg table dynamically. The
> > > Flink
> > > > > > > job
> > > > > > > is expected to create the new table(s) and continue writing to
> > them
> > > > > > > without
> > > > > > > a job restart. - The partitioning schema of the table changes.
> > The
> > > Flink
> > > > > > > job is expected to update the specification and continue writing
> > > to the
> > > > > > > target table without a job restart. --------- If you have any
> > > questions,
> > > > > > > ideas, suggestions please let us know on any of the email
> > threads,
> > > or in
> > > > > > > comments on the document. Thanks, Peter
> > > > > > >
> > > > > > > [1] -
> > > > >
> > > > >
> > >
> > https://docs.google.com/document/d/1R3NZmi65S4lwnmNjH4gLCuXZbgvZV5GNrQKJ5NYdO9s
> > > > >
> > > > > > > [2] - https://github.com/orgs/apache/projects/429
> > > > > > > [3] -
> > > https://lists.apache.org/thread/0sbosd8rzbl5rhqs9dtt99xr44kld3qp
> > >
> >
>

Re: Dynamic Iceberg Sink

Reply via email to