Dynamic Flink Iceberg Sink

Péter Váry Mon, 11 Nov 2024 09:32:28 -0800

Hi Team,

With Max Michels, we started to work on enhancing the current Iceberg Sink
to allow inserting evolving records into a changing table.
See:
https://docs.google.com/document/d/1R3NZmi65S4lwnmNjH4gLCuXZbgvZV5GNrQKJ5NYdO9s
Created the project to follow the lifecycle of the proposal:
https://github.com/orgs/apache/projects/429


>From the abstract:
---------
Flink Iceberg connector sink is the tool to write data to an Iceberg table
from a continuous Flink stream. The current Sink implementations emphasize
throughput over flexibility. The main limiting factor is that the Iceberg
Flink Sink requires static table structure. The table, the schema, the
partitioning specification need to be constant. If one of the previous
things changes the Flink Job needs to be restarted. This allows using
optimal record serialization and good performance, but real life use-cases
need to work around this limitation when the underlying table has changed.
We need to provide a tool to accommodate these changes.
[..]
The following typical use cases are considered during this design:
- Incoming Avro records schema changes (new columns are added, or other
backward compatible changes happen). The Flink job is expected to update
the table schema dynamically, and continue to ingest data with the new and
the old schema without a job restart.
- Incoming records define the target Iceberg table dynamically. The Flink
job is expected to create the new table(s) and continue writing to them
without a job restart.
- The partitioning schema of the table changes. The Flink job is expected
to update the specification and continue writing to the target table
without a job restart.
---------

If you have any questions, ideas, suggestions please let us know here, or
in comments on the document.

Thanks,
Peter

Dynamic Flink Iceberg Sink

Reply via email to