Equality deletes with Flink - design question

Gabor Kaszab Mon, 25 Mar 2024 07:45:14 -0700

Hey Iceberg Community,

I've recently had the chance to examine Iceberg's equality delete support
in a multi-engine perspective (Flink, Hive, Impala, Spark).
I started exploring how *Flink* can be used for writing and I observed that
there is a restriction that the users are forced to add the *partition
columns into the primary keys* when creating an upsert-mode table. This
came handy for me because it made the eq-delete read implementation easier
for me on the Impala side, but also made me curious about the original
motivation. So the questions I have in mind are:
- What was the motivation behind introducing this restriction?
- Technically would it be possible not to force partition cols into the PK?
Are there well known pros and cons?
- In theory if someone removed this restriction would the readers (for
instance Spark since that is mostly coupled engine into Iceberg) still be
able to read eq-deletes that doesn't contain the partition cols?
- Is there such a change to loosen this restriction on the roadmap for
anyone in the community?


Thanks,
Gabor

Equality deletes with Flink - design question

Reply via email to