Re: [DISCUSS] DROP PARTITION in Spark

Ryan Blue Wed, 17 Jul 2024 15:28:26 -0700

I agree with Walaa. Iceberg doesn't support partitions as specific
structures, which is why it makes no sense to implement ADD PARTITION.
While a DROP PARTITION may be convenient, it would actually be misleading.
If you changed the partitioning of a table, DROP PARTITION would no longer
work and it wouldn't be clear why. I think it is always best to express
operations in terms of what you're actually trying to do (i.e. DELETE FROM)
rather than relying on a physical property of the data (the partition) that
might change.


On Wed, Jul 17, 2024 at 3:24 PM Steve Zhang <hongyue_zh...@apple.com.invalid>
wrote:

> Mostly agreed with Walaa’s statement above, I think partition is first
> class citizen in hive but was modeled differently in iceberg to support
> hidden partition and partition  evolution.
>
> To me, the partition in hive is explicit and static, the partition clause
> in DROP PARTITION can be error prone where its column and value cannot be
> validated. We want to make hive to iceberg migration as easy as possible,
> but realizing the partition difference might be the first step. Also I
> enjoy reading this section in iceberg doc
> https://iceberg.apache.org/docs/latest/partitioning/#what-does-iceberg-do-differently
>
> Thanks,
> Steve Zhang
>
>
>
> On Jul 17, 2024, at 2:36 PM, Walaa Eldin Moustafa <wa.moust...@gmail.com>
> wrote:
>
> Hi Jean, One use case is Hive to Iceberg migration, where DROP PARTITION
> does not need to change to DELETE queries prior to the migration.
>
> That said, I am not in favor of adding this to Iceberg directly (or
> Iceberg-Spark) due to the reasons Jean mentioned. It might be possible to
> do it in a custom extension or custom connector outside Iceberg that
> is specific for the use case (e.g., the migration use case I mentioned
> above).
>
> Further, as Szhehon said, it would not make sense without ADD PARTITION.
> However, ADD PARTITION requires a spec change (since Iceberg does not
> support empty partitions but ADD PARTITION does).
>
> So overall I am -1 to DROP PARTITION in Iceberg default implementation,
> and I think it is better to consider implementing in a use case specific
> implementation.
>
> Thanks,
> Walaa.
>
>
> On Wed, Jul 17, 2024 at 12:34 PM Jean-Baptiste Onofré <j...@nanthrax.net>
> wrote:
>
>> Hi Gabor
>>
>> Do you have user requests for that ? As Iceberg produces partitions by
>> taking column values (optionally with a transform function). So the
>> hidden partitioning doesn't require user actions. I wonder the use
>> cases for dynamic partitioning (using ADD/DROP). Is it more for
>> partition maintenance ?
>>
>> Thanks !
>> Regards
>> JB
>>
>> On Wed, Jul 17, 2024 at 11:11 AM Gabor Kaszab <gaborkas...@apache.org>
>> wrote:
>> >
>> > Hey Community,
>> >
>> > I learned recently that Spark doesn't support DROP PARTITION for
>> Iceberg tables. I understand this is because the DROP PARTITION is
>> something being used for Hive tables and Iceberg's model for hidden
>> partitioning makes it unnatural to have commands like this.
>> >
>> > However, I think that DROP PARTITION would still have some value for
>> users. In fact in Impala we implemented this even for Iceberg tables.
>> Benefits could be:
>> >  - Users having workloads on Hive tables could use their workloads
>> after they migrated their tables to Iceberg.
>> >  - Opposed to DELETE FROM, DROP PARTITION has a guarantee that this is
>> going to be a metadata only operation and no delete files are going to be
>> written.
>> >
>> > I'm curious what the community thinks of this.
>> > Gabor
>> >
>>
>
>

-- 
Ryan Blue
Databricks

Re: [DISCUSS] DROP PARTITION in Spark

Reply via email to