Re: Dynamic Flink Iceberg Sink

Maximilian Michels Fri, 28 Feb 2025 03:30:14 -0800

Good news. There is now a pull request available:
https://github.com/apache/iceberg/pull/12424


-Max

On Thu, Nov 21, 2024 at 4:36 PM Ferenc Csaky <ferenc.cs...@pm.me.invalid> wrote:
>
> Hello devs,
>
> +1 from my side, as I look things from the Flink perspective. The Flink 
> mailing list thread Peter
> linked in his previous message already has more supporters who are agreeing 
> this feature
> would be pretty helpful regarding CDC tasks as well.
>
> Multiple users (including us) are looking forward to this feature. Looking 
> forward to any input
> from the Iceberg side.
>
> Best,
> Ferenc
> On Thursday, November 21st, 2024 at 15:11, Péter Váry 
> <peter.vary.apa...@gmail.com> wrote:
>
> Many of the Flink users support the Dynamic Sink.
> See: https://lists.apache.org/thread/khw0z63n34cmh2nrzrx7j9bdmzz861lb
>
> Any comments from the Iceberg community side?
>
> Jean-Baptiste Onofré <j...@nanthrax.net> ezt írta (időpont: 2024. nov. 13., 
> Sze, 14:06):
>>
>> Thanks for the proposal!
>> I will take a look asap.
>>
>> Regards
>> JB
>>
>> On Mon, Nov 11, 2024 at 6:32 PM Péter Váry <peter.vary.apa...@gmail.com> 
>> wrote:
>> >
>> > Hi Team,
>> >
>> > With Max Michels, we started to work on enhancing the current Iceberg Sink 
>> > to allow inserting evolving records into a changing table.
>> > See: 
>> > https://docs.google.com/document/d/1R3NZmi65S4lwnmNjH4gLCuXZbgvZV5GNrQKJ5NYdO9s
>> > Created the project to follow the lifecycle of the proposal: 
>> > https://github.com/orgs/apache/projects/429
>> >
>> > From the abstract:
>> > ---------
>> > Flink Iceberg connector sink is the tool to write data to an Iceberg table 
>> > from a continuous Flink stream. The current Sink implementations emphasize 
>> > throughput over flexibility. The main limiting factor is that the Iceberg 
>> > Flink Sink requires static table structure. The table, the schema, the 
>> > partitioning specification need to be constant. If one of the previous 
>> > things changes the Flink Job needs to be restarted. This allows using 
>> > optimal record serialization and good performance, but real life use-cases 
>> > need to work around this limitation when the underlying table has changed. 
>> > We need to provide a tool to accommodate these changes.
>> > [..]
>> > The following typical use cases are considered during this design:
>> > - Incoming Avro records schema changes (new columns are added, or other 
>> > backward compatible changes happen). The Flink job is expected to update 
>> > the table schema dynamically, and continue to ingest data with the new and 
>> > the old schema without a job restart.
>> > - Incoming records define the target Iceberg table dynamically. The Flink 
>> > job is expected to create the new table(s) and continue writing to them 
>> > without a job restart.
>> > - The partitioning schema of the table changes. The Flink job is expected 
>> > to update the specification and continue writing to the target table 
>> > without a job restart.
>> > ---------
>> >
>> > If you have any questions, ideas, suggestions please let us know here, or 
>> > in comments on the document.
>> >
>> > Thanks,
>> > Peter
>
>

Re: Dynamic Flink Iceberg Sink

Reply via email to