Thanks for the proposal! I will take a look asap. Regards JB
On Mon, Nov 11, 2024 at 6:32 PM Péter Váry <peter.vary.apa...@gmail.com> wrote: > > Hi Team, > > With Max Michels, we started to work on enhancing the current Iceberg Sink to > allow inserting evolving records into a changing table. > See: > https://docs.google.com/document/d/1R3NZmi65S4lwnmNjH4gLCuXZbgvZV5GNrQKJ5NYdO9s > Created the project to follow the lifecycle of the proposal: > https://github.com/orgs/apache/projects/429 > > From the abstract: > --------- > Flink Iceberg connector sink is the tool to write data to an Iceberg table > from a continuous Flink stream. The current Sink implementations emphasize > throughput over flexibility. The main limiting factor is that the Iceberg > Flink Sink requires static table structure. The table, the schema, the > partitioning specification need to be constant. If one of the previous things > changes the Flink Job needs to be restarted. This allows using optimal record > serialization and good performance, but real life use-cases need to work > around this limitation when the underlying table has changed. We need to > provide a tool to accommodate these changes. > [..] > The following typical use cases are considered during this design: > - Incoming Avro records schema changes (new columns are added, or other > backward compatible changes happen). The Flink job is expected to update the > table schema dynamically, and continue to ingest data with the new and the > old schema without a job restart. > - Incoming records define the target Iceberg table dynamically. The Flink job > is expected to create the new table(s) and continue writing to them without a > job restart. > - The partitioning schema of the table changes. The Flink job is expected to > update the specification and continue writing to the target table without a > job restart. > --------- > > If you have any questions, ideas, suggestions please let us know here, or in > comments on the document. > > Thanks, > Peter