dump Kafka to type 1 snapshot

Lian Jiang Tue, 04 May 2021 20:36:59 -0700

Hi,

I want to dump events in a kafka topic into datalake as a type 1 snapshot
in iceberg. Type 1 means a record having a key will overwrite the previous
record having the same key. Each key will have only one record in the
snapshot.


Note that I want to simplify the long path:
kafka -> (streaming job) -> type 2 historical -> (spark job) -> type 1
snapshot
TO the short path:
kafka -> (streaming job) -> type 1 snapshot.

The short path requires the streaming job directly update the type 1
snapshot using row level update (e.g. merge into). The streaming job can be
either spark structured streaming (SSS) or flink.

According to https://iceberg.apache.org/spark-structured-streaming/, SSS
does not support "merge into".

According to https://iceberg.apache.org/flink/#insert-into, Flink streaming
job does not support "merge into" or "insert into".

Is this a dead end? Appreciate any hints.

dump Kafka to type 1 snapshot

Reply via email to