Let me summarize what we talked here and follow up with a PR.

- Iceberg should allow users to define a sort oder in its metadata that applies 
to partitions.
- We should never assume the sort order is actually applied to all files in the 
table.
- Sort orders might evolve and change over time. When this happens, existing 
files will not be rewritten. Query engines should follow the updated sort order 
during subsequent writes. As a result, files within a table or partition can be 
sorted differently at a given point in time.
- We should be able to define a sort order even for unpartitioned tables, as 
opposed to current Spark tables that allow a sort order only for bucketed 
tables.
- SortOrder is separate from PartitionSpec.
- SortOrder will rely on transformations to define complex sort orders.
- Files will be annotated with sort_order_id instead of sort_columns. We keep 
the question of file_ordinal open for now.
- To begin with, we will support asc/desc natural sort orders (UTF8 ordering 
for Strings).

Thanks,
Anton

> On 16 Jul 2019, at 23:56, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> 
> I agree that Iceberg metadata should include a way to configure a desired 
> sort order. But I want to note that I don’t think that we can ever assume 
> that it has been applied. Table configuration will evolve as use changes. We 
> don’t want to require rewrites when a configuration gets updated, so an 
> assumption should be that data files might not be sorted.
> 
> Files that are sorted should indicate how they are sorted, so that 
> optimizations are applied if the file’s metadata indicates it can be safely 
> applied. For example, if both deletes and data rows are sorted the same way, 
> you can merge the two streams instead of using a hash set to check whether a 
> record has been deleted. I think this should rely on the delete file’s sort 
> order matching the data file it is applied to.
> 
> Should Iceberg allow users to define a sort spec only if the table is 
> bucketed?
> 
> No. In Iceberg, bucketing is just another partition transform.
> 
> However, I think we need to consider what a sort order will mean. Here are a 
> few observations:
> 
> Each file can have a sort order for its rows (Spark’s sortWithinPartitions, 
> which sorts each task’s data)
> Sorting is also used to cluster values across files so it makes sense for a 
> table sort order to be applied within partitions (ORDER BY)
> Multiple writes to the same partition are not expected to rewrite existing 
> data, so a partition may only be partially sorted or may have multiple sorted 
> file sets
> Partitioning is independent from sorting. Even when partitioning is 
> orthogonal to a sort order (i.e., bucketing), partitioning must still take 
> precedence.
> My conclusion is that a configured sort order applies to partitions, not data 
> across partitions. Again, bucketing is just another type of partition.
> 
> How should Iceberg encode sort specs?
> 
> I don’t think this should be in table properties. The sort order should 
> reference columns by ID so it doesn’t need to be changed when columns are 
> renamed. I think this should be implemented like PartitionSpec.
> 
> If sorting is applied within partitions, then I would make PartitionSpec and 
> SortOrder separate. I would still use transforms to produce more complex sort 
> orders. I think that’s a great idea, but we don’t need to mix partitioning 
> and sorting to reuse transforms. Like partition specs, I think a table should 
> be able to define multiple sort orders and each should be identified by ID. 
> Then each data file can encode which sort order it was written with, just 
> like manifests and partition specs.
> 
> I think we should add sort-orders like partition-specs, and a 
> default-sort-order-id like default-spec-id. This would also require removing 
> sort_columns from data files in the spec 
> <http://iceberg.apache.org/spec/#manifests> and adding sort_order_id. We can 
> keep file_ordinal, but probably want to add some context to know the group of 
> files where it is valid. We could also remove it.
> 
> Which sort orders should Iceberg support?
> 
> I agree with what’s already been said: we should use a natural order for each 
> type 
> <https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Comparators.java#L31-L45>,
>  ascending and descending. To start, Strings must use UTF-8’s natural 
> ordering and we can expand from there.
> 
> Here’s what a sort order might look like:
> 
> "sort-orders": [
>     { "id": 1, "ordering": [ { "source-id": 4, "ascending": true } ] },
>     { "id": 2, "ordering": [ { "source-id": 4, "ascending": true }, { 
> "source-id": 5, "ascending": false } ] },
>     { "id": 3, "ordering": [ { "transform": "zorder", "ascending": false, 
> "source-ids": [4, 5] } ] },
>   ]
> 
> On Thu, Jul 4, 2019 at 9:46 AM Anton Okolnychyi 
> <aokolnyc...@apple.com.invalid> wrote:
> In order to begin prototyping, I would start with the following questions.
> 
> 1) Does Iceberg need a sort spec?
>       - I would say yes
> 2) Should Iceberg allow users to define a sort spec only if the table is 
> bucketed?
>       - I would say no, as it seems valid to have partitioned and sorted 
> tables.
> 3) How should Iceberg encode sort specs?
>       - Option #1 is to rely on table properties, which will allow us to use 
> ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am 
> not sure it would be easy to encode non-trivial sort specs and track sort 
> spec evolution (if needed).
>       - Option #2 is to extend PartitionSpec to cover sorting as well. This 
> option will allow us to use transformations to encode non-trivial sorts and 
> won't require many changes to the codebase.
>       - Option #3 is to store SortSpec separately from PartitionSpec. This 
> will require more changes compared to Option #2 but can also give us extra 
> flexibility.
> 
> Each option has its own trade-offs, but I tend to think #2 is reasonable.
> 
> 4) Which sort orders should Iceberg support?
>       - I think we have to be flexible and support adding more sort orders 
> later. In addition to what Owen said, we can add sorting based on 
> multi-dimensional space-filling curves in the future.
> 
> 
> What do you think?
> 
> Thanks,
> Anton
> 
>> On 1 Jul 2019, at 18:06, Owen O'Malley <owen.omal...@gmail.com 
>> <mailto:owen.omal...@gmail.com>> wrote:
>> 
>> My thought is just like Iceberg has to define partitioning and bucketing, it 
>> has to define a canonical sort order. In particular, we can’t afford to have 
>> Spark, Presto, and Hive writing files in different orders. I believe the 
>> right approach is to define a sort order as a series of columns where each 
>> column is either ascending or descending and defining the natural sort order 
>> for each type.
>> 
>> The hard bit will be if we need to support non-natural sorts of strings. For 
>> example, if we need to support case-insensitive sorts or the different 
>> collations that databases support, I’d hope that we could start with the 
>> default of utf-8 byte ordering and expand as needed. If you are curious what 
>> the different collations look like - 
>> https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database
>>  
>> <https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database>
>>  .
>> 
>> .. Owen
>> 
>>> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi <aokolnyc...@apple.com.INVALID 
>>> <mailto:aokolnyc...@apple.com.INVALID>> wrote:
>>> 
>>> Hey folks,
>>> 
>>> Iceberg users are advised not only to partition their data but also to sort 
>>> within partitions by columns in predicates in order to get the best 
>>> performance. Right now, this process is mostly manual and performed by 
>>> users before writing.
>>> I am wondering if we should extend Iceberg metadata so that query engines 
>>> can do this automatically in the future. We already have `sortColumns` in 
>>> DataFile but they are not used.
>>> Do we need a notion of sort columns in TableMetadata?
>>> Spark’s sort spec is tightly coupled with bucketing and cannot be used 
>>> alone. However, it seems reasonable to have partitioned and sorted tables 
>>> without bucketing. How do we see this in Iceberg?
>>> If we decide to have sort spec in the metadata, do we want to make it part 
>>> of PartitionSpec or have it separately?
>>> Thanks,
>>> Anton
>>> 
>> 
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Reply via email to