Hi, Without knowledge of this history, my reasoning is we will have deduplication across partitions if partition col is not included in the identifier fields (pks). That doesn't look right to me.
For example, if a user table has uid as primary key and dt as partition, and user data are inserted every day. When we query the table without a dt predicate, shall we only return data from the latest dt? Thanks, Manu On Thu, Apr 11, 2024 at 12:53 PM Péter Váry <peter.vary.apa...@gmail.com> wrote: > Hi Gabor, > > I don't know about the historical reasons, so I was hoping someone with > familiarity of the past would chime in. > > Since there were no response, here are my thoughts: > > When writing CDC, we need every record with the same primary key on the > same writer (this is a must). Also there are good reasons to cluster > writing data for the same partition to the same writers (this is > encouraged, to prevent creating may small files, but not a requirement). > Implementing a writing flow for writes where the data based partitioning is > required, and also primary keys based partitioning is required, and we need > to combine those by code is definitely possible, but probably doesn't worth > the effort. > > About the read side: > By my understanding, the Iceberg table partitioning scheme could be > changed anytime. I am not sure there are any safeguards there to change it > after the table is created (it would be good to know). > > That said, reading equality deletes is a costly business. When I last > checked the readers keep the equality delete list in memory, and checked > them against the current rows. This might have changed in Spark, but I > think Flink still does this. So having restriction on the scope of the > deletes is always a good idea. > > I hope this helps you, and I still hope that somebody with longer history > with the Flink code could expand on my answers. > > Thanks, Peter > > On Wed, Apr 10, 2024, 16:11 Gabor Kaszab <gaborkas...@apache.org> wrote: > >> Hey Iceberg People, >> >> Just pinging this thread here. Any Flink expertise on the above questions >> is appreciated! :) >> >> Gabor >> >> On Mon, Mar 25, 2024 at 3:43 PM Gabor Kaszab <gaborkas...@apache.org> >> wrote: >> >>> Hey Iceberg Community, >>> >>> I've recently had the chance to examine Iceberg's equality delete >>> support in a multi-engine perspective (Flink, Hive, Impala, Spark). >>> I started exploring how *Flink* can be used for writing and I observed >>> that there is a restriction that the users are forced to add the *partition >>> columns into the primary keys* when creating an upsert-mode table. This >>> came handy for me because it made the eq-delete read implementation easier >>> for me on the Impala side, but also made me curious about the original >>> motivation. So the questions I have in mind are: >>> - What was the motivation behind introducing this restriction? >>> - Technically would it be possible not to force partition cols into the >>> PK? Are there well known pros and cons? >>> - In theory if someone removed this restriction would the readers (for >>> instance Spark since that is mostly coupled engine into Iceberg) still be >>> able to read eq-deletes that doesn't contain the partition cols? >>> - Is there such a change to loosen this restriction on the roadmap for >>> anyone in the community? >>> >>> Thanks, >>> Gabor >>> >>