Hi Gabor, I don't know about the historical reasons, so I was hoping someone with familiarity of the past would chime in.
Since there were no response, here are my thoughts: When writing CDC, we need every record with the same primary key on the same writer (this is a must). Also there are good reasons to cluster writing data for the same partition to the same writers (this is encouraged, to prevent creating may small files, but not a requirement). Implementing a writing flow for writes where the data based partitioning is required, and also primary keys based partitioning is required, and we need to combine those by code is definitely possible, but probably doesn't worth the effort. About the read side: By my understanding, the Iceberg table partitioning scheme could be changed anytime. I am not sure there are any safeguards there to change it after the table is created (it would be good to know). That said, reading equality deletes is a costly business. When I last checked the readers keep the equality delete list in memory, and checked them against the current rows. This might have changed in Spark, but I think Flink still does this. So having restriction on the scope of the deletes is always a good idea. I hope this helps you, and I still hope that somebody with longer history with the Flink code could expand on my answers. Thanks, Peter On Wed, Apr 10, 2024, 16:11 Gabor Kaszab <gaborkas...@apache.org> wrote: > Hey Iceberg People, > > Just pinging this thread here. Any Flink expertise on the above questions > is appreciated! :) > > Gabor > > On Mon, Mar 25, 2024 at 3:43 PM Gabor Kaszab <gaborkas...@apache.org> > wrote: > >> Hey Iceberg Community, >> >> I've recently had the chance to examine Iceberg's equality delete support >> in a multi-engine perspective (Flink, Hive, Impala, Spark). >> I started exploring how *Flink* can be used for writing and I observed >> that there is a restriction that the users are forced to add the *partition >> columns into the primary keys* when creating an upsert-mode table. This >> came handy for me because it made the eq-delete read implementation easier >> for me on the Impala side, but also made me curious about the original >> motivation. So the questions I have in mind are: >> - What was the motivation behind introducing this restriction? >> - Technically would it be possible not to force partition cols into the >> PK? Are there well known pros and cons? >> - In theory if someone removed this restriction would the readers (for >> instance Spark since that is mostly coupled engine into Iceberg) still be >> able to read eq-deletes that doesn't contain the partition cols? >> - Is there such a change to loosen this restriction on the roadmap for >> anyone in the community? >> >> Thanks, >> Gabor >> >