Re: Dynamic Iceberg Sink

Leonard Xu Wed, 13 Nov 2024 17:46:29 -0800

Thanks Peter and Max for kicking off this discussion, +1 for this proposal.


Dynamic Sink is a key improvement for Iceberg Sink, and it can be well 
integrated with Flink CDC’s  pipeline sink in the future as well.

Best,
Leonard


> 2024年11月13日 下午7:57，Maximilian Michels <m...@apache.org> 写道：
> 
> Thanks for driving this Peter! The Dynamic Iceberg Sink is meant to solve
> several pain points of the current Flink Iceberg sink. Note that the
> Iceberg sink lives in Apache Iceberg, not Apache Flink, although that may
> change one day when the interfaces are stable enough.
> 
> From the user perspective, there are three main advantages to using the
> Dynamic Iceberg sink:
> 
> 1. It supports writing to any number of tables (No more 1:1 sink/topic
> relationship).
> 2. It can dynamically create and update tables based on the user-supplied
> routing.
> 3. It can dynamically update the schema and partitioning of tables based on
> the user-supplied specification.
> 
> I'd say these features will have a big impact on Flink Iceberg users.
> 
> Cheers,
> Max
> 
> On Wed, Nov 13, 2024 at 12:27 PM Péter Váry <peter.vary.apa...@gmail.com>
> wrote:
> 
>> Hi Team, With Max Michels, we started to work on enhancing the current
>> Iceberg Sink to allow inserting evolving records into a changing table.
>> 
>> Created the design doc [1]
>> Created the Iceberg proposal [2]
>> Started the conversation on the Iceberg dev list [3]
>> 
>> From the abstract: --------- Flink Iceberg connector sink is the tool to
>> write data to an Iceberg table from a continuous Flink stream. The current
>> Sink implementations emphasize throughput over flexibility. The main
>> limiting factor is that the Iceberg Flink Sink requires static table
>> structure. The table, the schema, the partitioning specification need to be
>> constant. If one of the previous things changes the Flink Job needs to be
>> restarted. This allows using optimal record serialization and good
>> performance, but real life use-cases need to work around this limitation
>> when the underlying table has changed. We need to provide a tool to
>> accommodate these changes. [..] The following typical use cases are
>> considered during this design: - Incoming Avro records schema changes (new
>> columns are added, or other backward compatible changes happen). The Flink
>> job is expected to update the table schema dynamically, and continue to
>> ingest data with the new and the old schema without a job restart. -
>> Incoming records define the target Iceberg table dynamically. The Flink job
>> is expected to create the new table(s) and continue writing to them without
>> a job restart. - The partitioning schema of the table changes. The Flink
>> job is expected to update the specification and continue writing to the
>> target table without a job restart. --------- If you have any questions,
>> ideas, suggestions please let us know on any of the email threads, or in
>> comments on the document. Thanks, Peter
>> 
>> [1] -
>> 
>> https://docs.google.com/document/d/1R3NZmi65S4lwnmNjH4gLCuXZbgvZV5GNrQKJ5NYdO9s
>> [2] - https://github.com/orgs/apache/projects/429
>> [3] - https://lists.apache.org/thread/0sbosd8rzbl5rhqs9dtt99xr44kld3qp
>>

Re: Dynamic Iceberg Sink

Reply via email to