I agree with Walaa. Iceberg doesn't support partitions as specific structures, which is why it makes no sense to implement ADD PARTITION. While a DROP PARTITION may be convenient, it would actually be misleading. If you changed the partitioning of a table, DROP PARTITION would no longer work and it wouldn't be clear why. I think it is always best to express operations in terms of what you're actually trying to do (i.e. DELETE FROM) rather than relying on a physical property of the data (the partition) that might change.
On Wed, Jul 17, 2024 at 3:24 PM Steve Zhang <hongyue_zh...@apple.com.invalid> wrote: > Mostly agreed with Walaa’s statement above, I think partition is first > class citizen in hive but was modeled differently in iceberg to support > hidden partition and partition evolution. > > To me, the partition in hive is explicit and static, the partition clause > in DROP PARTITION can be error prone where its column and value cannot be > validated. We want to make hive to iceberg migration as easy as possible, > but realizing the partition difference might be the first step. Also I > enjoy reading this section in iceberg doc > https://iceberg.apache.org/docs/latest/partitioning/#what-does-iceberg-do-differently > > Thanks, > Steve Zhang > > > > On Jul 17, 2024, at 2:36 PM, Walaa Eldin Moustafa <wa.moust...@gmail.com> > wrote: > > Hi Jean, One use case is Hive to Iceberg migration, where DROP PARTITION > does not need to change to DELETE queries prior to the migration. > > That said, I am not in favor of adding this to Iceberg directly (or > Iceberg-Spark) due to the reasons Jean mentioned. It might be possible to > do it in a custom extension or custom connector outside Iceberg that > is specific for the use case (e.g., the migration use case I mentioned > above). > > Further, as Szhehon said, it would not make sense without ADD PARTITION. > However, ADD PARTITION requires a spec change (since Iceberg does not > support empty partitions but ADD PARTITION does). > > So overall I am -1 to DROP PARTITION in Iceberg default implementation, > and I think it is better to consider implementing in a use case specific > implementation. > > Thanks, > Walaa. > > > On Wed, Jul 17, 2024 at 12:34 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> Hi Gabor >> >> Do you have user requests for that ? As Iceberg produces partitions by >> taking column values (optionally with a transform function). So the >> hidden partitioning doesn't require user actions. I wonder the use >> cases for dynamic partitioning (using ADD/DROP). Is it more for >> partition maintenance ? >> >> Thanks ! >> Regards >> JB >> >> On Wed, Jul 17, 2024 at 11:11 AM Gabor Kaszab <gaborkas...@apache.org> >> wrote: >> > >> > Hey Community, >> > >> > I learned recently that Spark doesn't support DROP PARTITION for >> Iceberg tables. I understand this is because the DROP PARTITION is >> something being used for Hive tables and Iceberg's model for hidden >> partitioning makes it unnatural to have commands like this. >> > >> > However, I think that DROP PARTITION would still have some value for >> users. In fact in Impala we implemented this even for Iceberg tables. >> Benefits could be: >> > - Users having workloads on Hive tables could use their workloads >> after they migrated their tables to Iceberg. >> > - Opposed to DELETE FROM, DROP PARTITION has a guarantee that this is >> going to be a metadata only operation and no delete files are going to be >> written. >> > >> > I'm curious what the community thinks of this. >> > Gabor >> > >> > > -- Ryan Blue Databricks