Re: Dynamic Iceberg Sink

Ferenc Csaky Thu, 14 Nov 2024 05:26:27 -0800

Hi Peter, Max,

Thanks for starting this discussion, +1 for the proposal. I agree
that the added functionality would solve some major pain points for
Flink -> Iceberg use-cases!


Best,
Ferenc




On Thursday, November 14th, 2024 at 03:54, Congxian Qiu 
<[email protected]> wrote:

> 
> 
> Thanks for kicking off this discussion, +1 for this proposal.
> 
> Currently, we use FlinkCDC as the ingestion tool to write to Iceberg, and
> need to restart the job if there is any schema evolution, Look forward to
> seeing this integrated with FlinkCDC
> 
> Best,
> Congxian
> 
> 
> Leonard Xu [email protected] 于2024年11月14日周四 09:37写道：
> 
> > Thanks Peter and Max for kicking off this discussion, +1 for this proposal.
> > 
> > Dynamic Sink is a key improvement for Iceberg Sink, and it can be well
> > integrated with Flink CDC’s pipeline sink in the future as well.
> > 
> > Best,
> > Leonard
> > 
> > > 2024年11月13日 下午7:57，Maximilian Michels [email protected] 写道：
> > > 
> > > Thanks for driving this Peter! The Dynamic Iceberg Sink is meant to solve
> > > several pain points of the current Flink Iceberg sink. Note that the
> > > Iceberg sink lives in Apache Iceberg, not Apache Flink, although that may
> > > change one day when the interfaces are stable enough.
> > > 
> > > From the user perspective, there are three main advantages to using the
> > > Dynamic Iceberg sink:
> > > 
> > > 1. It supports writing to any number of tables (No more 1:1 sink/topic
> > > relationship).
> > > 2. It can dynamically create and update tables based on the user-supplied
> > > routing.
> > > 3. It can dynamically update the schema and partitioning of tables based
> > > on
> > > the user-supplied specification.
> > > 
> > > I'd say these features will have a big impact on Flink Iceberg users.
> > > 
> > > Cheers,
> > > Max
> > > 
> > > On Wed, Nov 13, 2024 at 12:27 PM Péter Váry <[email protected]
> > > 
> > > wrote:
> > > 
> > > > Hi Team, With Max Michels, we started to work on enhancing the current
> > > > Iceberg Sink to allow inserting evolving records into a changing table.
> > > > 
> > > > Created the design doc [1]
> > > > Created the Iceberg proposal [2]
> > > > Started the conversation on the Iceberg dev list [3]
> > > > 
> > > > From the abstract: --------- Flink Iceberg connector sink is the tool to
> > > > write data to an Iceberg table from a continuous Flink stream. The
> > > > current
> > > > Sink implementations emphasize throughput over flexibility. The main
> > > > limiting factor is that the Iceberg Flink Sink requires static table
> > > > structure. The table, the schema, the partitioning specification need
> > > > to be
> > > > constant. If one of the previous things changes the Flink Job needs to
> > > > be
> > > > restarted. This allows using optimal record serialization and good
> > > > performance, but real life use-cases need to work around this limitation
> > > > when the underlying table has changed. We need to provide a tool to
> > > > accommodate these changes. [..] The following typical use cases are
> > > > considered during this design: - Incoming Avro records schema changes
> > > > (new
> > > > columns are added, or other backward compatible changes happen). The
> > > > Flink
> > > > job is expected to update the table schema dynamically, and continue to
> > > > ingest data with the new and the old schema without a job restart. -
> > > > Incoming records define the target Iceberg table dynamically. The Flink
> > > > job
> > > > is expected to create the new table(s) and continue writing to them
> > > > without
> > > > a job restart. - The partitioning schema of the table changes. The Flink
> > > > job is expected to update the specification and continue writing to the
> > > > target table without a job restart. --------- If you have any questions,
> > > > ideas, suggestions please let us know on any of the email threads, or in
> > > > comments on the document. Thanks, Peter
> > > > 
> > > > [1] -
> > 
> > https://docs.google.com/document/d/1R3NZmi65S4lwnmNjH4gLCuXZbgvZV5GNrQKJ5NYdO9s
> > 
> > > > [2] - https://github.com/orgs/apache/projects/429
> > > > [3] - https://lists.apache.org/thread/0sbosd8rzbl5rhqs9dtt99xr44kld3qp

Re: Dynamic Iceberg Sink

Reply via email to