Thanks Peter and Max for kicking off this discussion, +1 for this proposal.
Dynamic Sink is a key improvement for Iceberg Sink, and it can be well integrated with Flink CDC’s pipeline sink in the future as well. Best, Leonard > 2024年11月13日 下午7:57,Maximilian Michels <m...@apache.org> 写道: > > Thanks for driving this Peter! The Dynamic Iceberg Sink is meant to solve > several pain points of the current Flink Iceberg sink. Note that the > Iceberg sink lives in Apache Iceberg, not Apache Flink, although that may > change one day when the interfaces are stable enough. > > From the user perspective, there are three main advantages to using the > Dynamic Iceberg sink: > > 1. It supports writing to any number of tables (No more 1:1 sink/topic > relationship). > 2. It can dynamically create and update tables based on the user-supplied > routing. > 3. It can dynamically update the schema and partitioning of tables based on > the user-supplied specification. > > I'd say these features will have a big impact on Flink Iceberg users. > > Cheers, > Max > > On Wed, Nov 13, 2024 at 12:27 PM Péter Váry <peter.vary.apa...@gmail.com> > wrote: > >> Hi Team, With Max Michels, we started to work on enhancing the current >> Iceberg Sink to allow inserting evolving records into a changing table. >> >> Created the design doc [1] >> Created the Iceberg proposal [2] >> Started the conversation on the Iceberg dev list [3] >> >> From the abstract: --------- Flink Iceberg connector sink is the tool to >> write data to an Iceberg table from a continuous Flink stream. The current >> Sink implementations emphasize throughput over flexibility. The main >> limiting factor is that the Iceberg Flink Sink requires static table >> structure. The table, the schema, the partitioning specification need to be >> constant. If one of the previous things changes the Flink Job needs to be >> restarted. This allows using optimal record serialization and good >> performance, but real life use-cases need to work around this limitation >> when the underlying table has changed. We need to provide a tool to >> accommodate these changes. [..] The following typical use cases are >> considered during this design: - Incoming Avro records schema changes (new >> columns are added, or other backward compatible changes happen). The Flink >> job is expected to update the table schema dynamically, and continue to >> ingest data with the new and the old schema without a job restart. - >> Incoming records define the target Iceberg table dynamically. The Flink job >> is expected to create the new table(s) and continue writing to them without >> a job restart. - The partitioning schema of the table changes. The Flink >> job is expected to update the specification and continue writing to the >> target table without a job restart. --------- If you have any questions, >> ideas, suggestions please let us know on any of the email threads, or in >> comments on the document. Thanks, Peter >> >> [1] - >> >> https://docs.google.com/document/d/1R3NZmi65S4lwnmNjH4gLCuXZbgvZV5GNrQKJ5NYdO9s >> [2] - https://github.com/orgs/apache/projects/429 >> [3] - https://lists.apache.org/thread/0sbosd8rzbl5rhqs9dtt99xr44kld3qp >>