Re: [DISCUSS] DROP PARTITION in Spark

Steve Zhang Wed, 17 Jul 2024 15:24:43 -0700

Mostly agreed with Walaa’s statement above, I think partition is first class 
citizen in hive but was modeled differently in iceberg to support hidden 
partition and partition  evolution.


To me, the partition in hive is explicit and static, the partition clause in 
DROP PARTITION can be error prone where its column and value cannot be 
validated. We want to make hive to iceberg migration as easy as possible, but 
realizing the partition difference might be the first step. Also I enjoy 
reading this section in iceberg doc 
https://iceberg.apache.org/docs/latest/partitioning/#what-does-iceberg-do-differently

Thanks,
Steve Zhang



> On Jul 17, 2024, at 2:36 PM, Walaa Eldin Moustafa <wa.moust...@gmail.com> 
> wrote:
> 
> Hi Jean, One use case is Hive to Iceberg migration, where DROP PARTITION does 
> not need to change to DELETE queries prior to the migration.
> 
> That said, I am not in favor of adding this to Iceberg directly (or 
> Iceberg-Spark) due to the reasons Jean mentioned. It might be possible to do 
> it in a custom extension or custom connector outside Iceberg that is specific 
> for the use case (e.g., the migration use case I mentioned above).
> 
> Further, as Szhehon said, it would not make sense without ADD PARTITION. 
> However, ADD PARTITION requires a spec change (since Iceberg does not support 
> empty partitions but ADD PARTITION does).
> 
> So overall I am -1 to DROP PARTITION in Iceberg default implementation, and I 
> think it is better to consider implementing in a use case specific 
> implementation.
> 
> Thanks,
> Walaa.
> 
> 
> On Wed, Jul 17, 2024 at 12:34 PM Jean-Baptiste Onofré <j...@nanthrax.net 
> <mailto:j...@nanthrax.net>> wrote:
>> Hi Gabor
>> 
>> Do you have user requests for that ? As Iceberg produces partitions by
>> taking column values (optionally with a transform function). So the
>> hidden partitioning doesn't require user actions. I wonder the use
>> cases for dynamic partitioning (using ADD/DROP). Is it more for
>> partition maintenance ?
>> 
>> Thanks !
>> Regards
>> JB
>> 
>> On Wed, Jul 17, 2024 at 11:11 AM Gabor Kaszab <gaborkas...@apache.org 
>> <mailto:gaborkas...@apache.org>> wrote:
>> >
>> > Hey Community,
>> >
>> > I learned recently that Spark doesn't support DROP PARTITION for Iceberg 
>> > tables. I understand this is because the DROP PARTITION is something being 
>> > used for Hive tables and Iceberg's model for hidden partitioning makes it 
>> > unnatural to have commands like this.
>> >
>> > However, I think that DROP PARTITION would still have some value for 
>> > users. In fact in Impala we implemented this even for Iceberg tables. 
>> > Benefits could be:
>> >  - Users having workloads on Hive tables could use their workloads after 
>> > they migrated their tables to Iceberg.
>> >  - Opposed to DELETE FROM, DROP PARTITION has a guarantee that this is 
>> > going to be a metadata only operation and no delete files are going to be 
>> > written.
>> >
>> > I'm curious what the community thinks of this.
>> > Gabor
>> >

Re: [DISCUSS] DROP PARTITION in Spark

Reply via email to