Hi Team, With Max Michels, we started to work on enhancing the current
Iceberg Sink to allow inserting evolving records into a changing table.

Created the design doc [1]
Created the Iceberg proposal [2]
Started the conversation on the Iceberg dev list [3]

>From the abstract: --------- Flink Iceberg connector sink is the tool to
write data to an Iceberg table from a continuous Flink stream. The current
Sink implementations emphasize throughput over flexibility. The main
limiting factor is that the Iceberg Flink Sink requires static table
structure. The table, the schema, the partitioning specification need to be
constant. If one of the previous things changes the Flink Job needs to be
restarted. This allows using optimal record serialization and good
performance, but real life use-cases need to work around this limitation
when the underlying table has changed. We need to provide a tool to
accommodate these changes. [..] The following typical use cases are
considered during this design: - Incoming Avro records schema changes (new
columns are added, or other backward compatible changes happen). The Flink
job is expected to update the table schema dynamically, and continue to
ingest data with the new and the old schema without a job restart. -
Incoming records define the target Iceberg table dynamically. The Flink job
is expected to create the new table(s) and continue writing to them without
a job restart. - The partitioning schema of the table changes. The Flink
job is expected to update the specification and continue writing to the
target table without a job restart. --------- If you have any questions,
ideas, suggestions please let us know on any of the email threads, or in
comments on the document. Thanks, Peter

[1] -
https://docs.google.com/document/d/1R3NZmi65S4lwnmNjH4gLCuXZbgvZV5GNrQKJ5NYdO9s
[2] - https://github.com/orgs/apache/projects/429
[3] - https://lists.apache.org/thread/0sbosd8rzbl5rhqs9dtt99xr44kld3qp

Reply via email to