My thought is just like Iceberg has to define partitioning and bucketing, it
has to define a canonical sort order. In particular, we can’t afford to have
Spark, Presto, and Hive writing files in different orders. I believe the right
approach is to define a sort order as a series of columns where
Hey,
The issue you pointed out is about tracking Iceberg tables in HMS and levering
HMS locks to commit metadata instead of relying on renames. This allows Iceberg
to reliably manage metadata when it is persisted in object stores.
At the same time, it is possible to migrate Spark tables to Iceb
Hey folks,
Iceberg users are advised not only to partition their data but also to sort
within partitions by columns in predicates in order to get the best
performance. Right now, this process is mostly manual and performed by users
before writing.
I am wondering if we should extend Iceberg meta