> b) they have a concern that with getting the WHERE filter of the DELETE not > aligned with partition boundaries they might end up having pos-deletes that > could have an impact on their read perf
I think this is a legit concern and currently `DELETE FROM` cannot guarantee that. It would be valuable to have a way to enforce that. Like others already pointed out, the concept of partitioning is not aligned in Iceberg and Hive, adding `DROP PARTITION` directly might confuse people if they have changed the partition spec of table. How about we adding a new SQL syntax to express the semantic of Delete/Drop partition only? Something like: ``` DELETE PARTITION FROM table WHERE delete_filter ``` The SQL will only delete partitions if it’s a metadata only operation. > On Aug 2, 2024, at 20:34, Gabor Kaszab <gaborkas...@cloudera.com.INVALID> > wrote: > > Hey Everyone, > > Thanks for the responses and sorry for the long delay in mine. Let me try to > answer the questions that came up. > > Yes, this has been an ask from a specific user who finds the lack of DROP > PARTITION as a blocker for migrating to Iceberg from Hive tables. I know, our > initial response was too to use DELETE FROM instead but a) there are users > who grew that big that it's nearly impossible to educate and b) they have a > concern that with getting the WHERE filter of the DELETE not aligned with > partition boundaries they might end up having pos-deletes that could have an > impact on their read perf. So they find it very crucial to have a guarantee > that when they try to drop data within a partition it's either a metadata > only operation or it fails. > > About ADD PARTITION: I agree it wouldn't make sense for Iceberg, but > fortunately there is no user ask for it either. I think DROP PARTITION would > still make sense without ADD PARTITION as the later one would be a no-op in > the Iceberg world. > > I gave this some thoughts and even though the concept of partitioning is not > aligned with a command like DROP PARTITION, I still see rationale to > implement it anyway. There are always going to be users coming from the > Hive-table world, it has some safety nets, and - even though I have no > contributions in Spark or Iceberg-Spark - this seems an isolated feature that > has no risk of causing regressions in the existing code. Partition evolution > is something that has to be given some extra thought wrt DROP PARTITION as > the Hive-world didn't have that, but in case we can have a consensus on that > I feel that this addition has added value. > > Not sure I know what it means to have a use-case specific implementation > instead of having it in e.g. Iceberg-Spark. > > Have a nice weekend! > Gabor > > On Mon, Jul 22, 2024 at 7:05 PM Jean-Baptiste Onofré <j...@nanthrax.net > <mailto:j...@nanthrax.net>> wrote: >> Hi Walaa >> >> It makes sense, thanks for pointing the use case. >> >> I agree that it's better to consider a use-case specific impl. >> >> Regards >> JB >> >> On Wed, Jul 17, 2024 at 11:36 PM Walaa Eldin Moustafa >> <wa.moust...@gmail.com <mailto:wa.moust...@gmail.com>> wrote: >> > >> > Hi Jean, One use case is Hive to Iceberg migration, where DROP PARTITION >> > does not need to change to DELETE queries prior to the migration. >> > >> > That said, I am not in favor of adding this to Iceberg directly (or >> > Iceberg-Spark) due to the reasons Jean mentioned. It might be possible to >> > do it in a custom extension or custom connector outside Iceberg that is >> > specific for the use case (e.g., the migration use case I mentioned above). >> > >> > Further, as Szhehon said, it would not make sense without ADD PARTITION. >> > However, ADD PARTITION requires a spec change (since Iceberg does not >> > support empty partitions but ADD PARTITION does). >> > >> > So overall I am -1 to DROP PARTITION in Iceberg default implementation, >> > and I think it is better to consider implementing in a use case specific >> > implementation. >> > >> > Thanks, >> > Walaa. >> > >> > >> > On Wed, Jul 17, 2024 at 12:34 PM Jean-Baptiste Onofré <j...@nanthrax.net >> > <mailto:j...@nanthrax.net>> wrote: >> >> >> >> Hi Gabor >> >> >> >> Do you have user requests for that ? As Iceberg produces partitions by >> >> taking column values (optionally with a transform function). So the >> >> hidden partitioning doesn't require user actions. I wonder the use >> >> cases for dynamic partitioning (using ADD/DROP). Is it more for >> >> partition maintenance ? >> >> >> >> Thanks ! >> >> Regards >> >> JB >> >> >> >> On Wed, Jul 17, 2024 at 11:11 AM Gabor Kaszab <gaborkas...@apache.org >> >> <mailto:gaborkas...@apache.org>> wrote: >> >> > >> >> > Hey Community, >> >> > >> >> > I learned recently that Spark doesn't support DROP PARTITION for >> >> > Iceberg tables. I understand this is because the DROP PARTITION is >> >> > something being used for Hive tables and Iceberg's model for hidden >> >> > partitioning makes it unnatural to have commands like this. >> >> > >> >> > However, I think that DROP PARTITION would still have some value for >> >> > users. In fact in Impala we implemented this even for Iceberg tables. >> >> > Benefits could be: >> >> > - Users having workloads on Hive tables could use their workloads >> >> > after they migrated their tables to Iceberg. >> >> > - Opposed to DELETE FROM, DROP PARTITION has a guarantee that this is >> >> > going to be a metadata only operation and no delete files are going to >> >> > be written. >> >> > >> >> > I'm curious what the community thinks of this. >> >> > Gabor >> >> >