Re: Sort Spec

2019-07-01 Thread Owen O'Malley
My thought is just like Iceberg has to define partitioning and bucketing, it has to define a canonical sort order. In particular, we can’t afford to have Spark, Presto, and Hive writing files in different orders. I believe the right approach is to define a sort order as a series of columns where

Re: Convert hive table to iceberg table

2019-07-01 Thread Anton Okolnychyi
Hey, The issue you pointed out is about tracking Iceberg tables in HMS and levering HMS locks to commit metadata instead of relying on renames. This allows Iceberg to reliably manage metadata when it is persisted in object stores. At the same time, it is possible to migrate Spark tables to Iceb

Sort Spec

2019-07-01 Thread Anton Okolnychyi
Hey folks, Iceberg users are advised not only to partition their data but also to sort within partitions by columns in predicates in order to get the best performance. Right now, this process is mostly manual and performed by users before writing. I am wondering if we should extend Iceberg meta