Re: Equality deletes with Flink - design question

Péter Váry Wed, 10 Apr 2024 21:53:41 -0700

Hi Gabor,

I don't know about the historical reasons, so I was hoping someone with
familiarity of the past would chime in.

Since there were no response, here are my thoughts:

When writing CDC, we need every record with the same primary key on the
same writer (this is a must). Also there are good reasons to cluster
writing data for the same partition to the same writers (this is
encouraged, to prevent creating may small files, but not a requirement).
Implementing a writing flow for writes where the data based partitioning is
required, and also primary keys based partitioning is required, and we need
to combine those by code is definitely possible, but probably doesn't worth
the effort.

About the read side:
By my understanding, the Iceberg table partitioning scheme could be changed
anytime. I am not sure there are any safeguards there to change it after
the table is created (it would be good to know).

That said, reading equality deletes is a costly business. When I last
checked the readers keep the equality delete list in memory, and checked
them against the current rows. This might have changed in Spark, but I
think Flink still does this. So having restriction on the scope of the
deletes is always a good idea.

I hope this helps you, and I still hope that somebody with longer history
with the Flink code could expand on my answers.

Thanks, Peter

On Wed, Apr 10, 2024, 16:11 Gabor Kaszab <gaborkas...@apache.org> wrote:

> Hey Iceberg People,
>
> Just pinging this thread here. Any Flink expertise on the above questions
> is appreciated! :)
>
> Gabor
>
> On Mon, Mar 25, 2024 at 3:43 PM Gabor Kaszab <gaborkas...@apache.org>
> wrote:
>
>> Hey Iceberg Community,
>>
>> I've recently had the chance to examine Iceberg's equality delete support
>> in a multi-engine perspective (Flink, Hive, Impala, Spark).
>> I started exploring how *Flink* can be used for writing and I observed
>> that there is a restriction that the users are forced to add the *partition
>> columns into the primary keys* when creating an upsert-mode table. This
>> came handy for me because it made the eq-delete read implementation easier
>> for me on the Impala side, but also made me curious about the original
>> motivation. So the questions I have in mind are:
>> - What was the motivation behind introducing this restriction?
>> - Technically would it be possible not to force partition cols into the
>> PK? Are there well known pros and cons?
>> - In theory if someone removed this restriction would the readers (for
>> instance Spark since that is mostly coupled engine into Iceberg) still be
>> able to read eq-deletes that doesn't contain the partition cols?
>> - Is there such a change to loosen this restriction on the roadmap for
>> anyone in the community?
>>
>> Thanks,
>> Gabor
>>
>

Re: Equality deletes with Flink - design question

Reply via email to