Thanks for the answers! Sorry, I didn't drop the subject I just had other
priorities too but still find this topic interesting to discuss.
Understood, DROP PARTITION can't happen.
*Thanks Anton* for showing some interest and sharing some alternatives!
I checked the canDeleteWhere() and canDeleteU
I still wonder if there is a clean way for users to ensure a DELETE
statement is purely a metadata operation. We, of course, should focus on
declarative commands but the cost of executing a row-level DELETE can be
unacceptable in some cases. I remember multiple teams were asking for that
to prevent
> We want users to think _less_ about how their operations are physically
> carried out. It is the responsibility of Iceberg and Spark to reduce the cost
> so that the user doesn't need to care.
> They should tell Spark what should happen, not how to do it.
I completely agree with your points. T
> That’s true, maybe we can start with a session conf or consulting the
Spark community to add the ability to enforce deletion via metadata
operation only?
I don't think this is the right direction. We want users to think _less_
about how their operations are physically carried out. It is the
resp
> we would instead add support for pushing down `CAST` expressions from Spark
Supporting pushing down more expressions is definitely worthy to explore with.
IIUC, we should already be able to do this kind of thing thanks to system
function push down. Users can issue a query to deterministically
There's a potential solution that's similar to what Xianjin suggested.
Rather than adding a new SQL keyword (which is a lot of work and specific
to Iceberg) we would instead add support for pushing down `CAST`
expressions from Spark. That way you could use filters like `DELETE FROM
table WHERE cast
> b) they have a concern that with getting the WHERE filter of the DELETE not
> aligned with partition boundaries they might end up having pos-deletes that
> could have an impact on their read perf
I think this is a legit concern and currently `DELETE FROM` cannot guarantee
that. It would be va
Hey Everyone,
Thanks for the responses and sorry for the long delay in mine. Let me try
to answer the questions that came up.
Yes, this has been an ask from a specific user who finds the lack of DROP
PARTITION as a blocker for migrating to Iceberg from Hive tables. I know,
our initial response wa
Hi Walaa
It makes sense, thanks for pointing the use case.
I agree that it's better to consider a use-case specific impl.
Regards
JB
On Wed, Jul 17, 2024 at 11:36 PM Walaa Eldin Moustafa
wrote:
>
> Hi Jean, One use case is Hive to Iceberg migration, where DROP PARTITION does
> not need to cha
I agree with Walaa. Iceberg doesn't support partitions as specific
structures, which is why it makes no sense to implement ADD PARTITION.
While a DROP PARTITION may be convenient, it would actually be misleading.
If you changed the partitioning of a table, DROP PARTITION would no longer
work and it
Mostly agreed with Walaa’s statement above, I think partition is first class
citizen in hive but was modeled differently in iceberg to support hidden
partition and partition evolution.
To me, the partition in hive is explicit and static, the partition clause in
DROP PARTITION can be error pro
Hi Jean, One use case is Hive to Iceberg migration, where DROP PARTITION
does not need to change to DELETE queries prior to the migration.
That said, I am not in favor of adding this to Iceberg directly (or
Iceberg-Spark) due to the reasons Jean mentioned. It might be possible to
do it in a custom
Hi Gabor
Do you have user requests for that ? As Iceberg produces partitions by
taking column values (optionally with a transform function). So the
hidden partitioning doesn't require user actions. I wonder the use
cases for dynamic partitioning (using ADD/DROP). Is it more for
partition maintenan
Based on my observations, users don't appear to be missing this feature,
but I'm OK to add it in Spark for compatibility purposes.
Yufei
On Wed, Jul 17, 2024 at 11:14 AM Szehon Ho wrote:
> Hi Gabor
>
> I'm neutral for this, but can be convinced. My initial thoughts is that
> there would be no
Hi Gabor
I'm neutral for this, but can be convinced. My initial thoughts is that
there would be no way to have ADD PARTITION (I assume old Hive workloads
would rely on this), and these are not ANSI SQL standard statements as
Spark moves to that direction.
The second point of guaranteeing a metad
Hey Community,
I learned recently that Spark doesn't support DROP PARTITION for Iceberg
tables. I understand this is because the DROP PARTITION is something being
used for Hive tables and Iceberg's model for hidden partitioning makes it
unnatural to have commands like this.
However, I think that
16 matches
Mail list logo