Hey Everyone, Thanks for the responses and sorry for the long delay in mine. Let me try to answer the questions that came up.
Yes, this has been an ask from a specific user who finds the lack of DROP PARTITION as a blocker for migrating to Iceberg from Hive tables. I know, our initial response was too to use DELETE FROM instead but a) there are users who grew that big that it's nearly impossible to educate and b) they have a concern that with getting the WHERE filter of the DELETE not aligned with partition boundaries they might end up having pos-deletes that could have an impact on their read perf. So they find it very crucial to have a guarantee that when they try to drop data within a partition it's either a metadata only operation or it fails. About ADD PARTITION: I agree it wouldn't make sense for Iceberg, but fortunately there is no user ask for it either. I think DROP PARTITION would still make sense without ADD PARTITION as the later one would be a no-op in the Iceberg world. I gave this some thoughts and even though the concept of partitioning is not aligned with a command like DROP PARTITION, I still see rationale to implement it anyway. There are always going to be users coming from the Hive-table world, it has some safety nets, and - even though I have no contributions in Spark or Iceberg-Spark - this seems an isolated feature that has no risk of causing regressions in the existing code. Partition evolution is something that has to be given some extra thought wrt DROP PARTITION as the Hive-world didn't have that, but in case we can have a consensus on that I feel that this addition has added value. Not sure I know what it means to have a use-case specific implementation instead of having it in e.g. Iceberg-Spark. Have a nice weekend! Gabor On Mon, Jul 22, 2024 at 7:05 PM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Walaa > > It makes sense, thanks for pointing the use case. > > I agree that it's better to consider a use-case specific impl. > > Regards > JB > > On Wed, Jul 17, 2024 at 11:36 PM Walaa Eldin Moustafa > <wa.moust...@gmail.com> wrote: > > > > Hi Jean, One use case is Hive to Iceberg migration, where DROP PARTITION > does not need to change to DELETE queries prior to the migration. > > > > That said, I am not in favor of adding this to Iceberg directly (or > Iceberg-Spark) due to the reasons Jean mentioned. It might be possible to > do it in a custom extension or custom connector outside Iceberg that is > specific for the use case (e.g., the migration use case I mentioned above). > > > > Further, as Szhehon said, it would not make sense without ADD PARTITION. > However, ADD PARTITION requires a spec change (since Iceberg does not > support empty partitions but ADD PARTITION does). > > > > So overall I am -1 to DROP PARTITION in Iceberg default implementation, > and I think it is better to consider implementing in a use case specific > implementation. > > > > Thanks, > > Walaa. > > > > > > On Wed, Jul 17, 2024 at 12:34 PM Jean-Baptiste Onofré <j...@nanthrax.net> > wrote: > >> > >> Hi Gabor > >> > >> Do you have user requests for that ? As Iceberg produces partitions by > >> taking column values (optionally with a transform function). So the > >> hidden partitioning doesn't require user actions. I wonder the use > >> cases for dynamic partitioning (using ADD/DROP). Is it more for > >> partition maintenance ? > >> > >> Thanks ! > >> Regards > >> JB > >> > >> On Wed, Jul 17, 2024 at 11:11 AM Gabor Kaszab <gaborkas...@apache.org> > wrote: > >> > > >> > Hey Community, > >> > > >> > I learned recently that Spark doesn't support DROP PARTITION for > Iceberg tables. I understand this is because the DROP PARTITION is > something being used for Hive tables and Iceberg's model for hidden > partitioning makes it unnatural to have commands like this. > >> > > >> > However, I think that DROP PARTITION would still have some value for > users. In fact in Impala we implemented this even for Iceberg tables. > Benefits could be: > >> > - Users having workloads on Hive tables could use their workloads > after they migrated their tables to Iceberg. > >> > - Opposed to DELETE FROM, DROP PARTITION has a guarantee that this > is going to be a metadata only operation and no delete files are going to > be written. > >> > > >> > I'm curious what the community thinks of this. > >> > Gabor > >> > >