I agree that Iceberg metadata should include a way to configure a desired
sort order. But I want to note that I don’t think that we can ever assume
that it has been applied. Table configuration will evolve as use changes.
We don’t want to require rewrites when a configuration gets updated, so an
assumption should be that data files might not be sorted.

Files that are sorted should indicate how they are sorted, so that
optimizations are applied if the file’s metadata indicates it can be safely
applied. For example, if both deletes and data rows are sorted the same
way, you can merge the two streams instead of using a hash set to check
whether a record has been deleted. I think this should rely on the delete
file’s sort order matching the data file it is applied to.

Should Iceberg allow users to define a sort spec only if the table is
bucketed?

No. In Iceberg, bucketing is just another partition transform.

However, I think we need to consider what a sort order will mean. Here are
a few observations:

   - Each file can have a sort order for its rows (Spark’s
   sortWithinPartitions, which sorts each task’s data)
   - Sorting is also used to cluster values across files so it makes sense
   for a table sort order to be applied within partitions (ORDER BY)
   - Multiple writes to the same partition are not expected to rewrite
   existing data, so a partition may only be partially sorted or may have
   multiple sorted file sets
   - Partitioning is independent from sorting. Even when partitioning is
   orthogonal to a sort order (i.e., bucketing), partitioning must still take
   precedence.

My conclusion is that a configured sort order applies to partitions, not
data across partitions. Again, bucketing is just another type of partition.

How should Iceberg encode sort specs?

I don’t think this should be in table properties. The sort order should
reference columns by ID so it doesn’t need to be changed when columns are
renamed. I think this should be implemented like PartitionSpec.

If sorting is applied within partitions, then I would make PartitionSpec
and SortOrder separate. I would still use transforms to produce more
complex sort orders. I think that’s a great idea, but we don’t need to mix
partitioning and sorting to reuse transforms. Like partition specs, I think
a table should be able to define multiple sort orders and each should be
identified by ID. Then each data file can encode which sort order it was
written with, just like manifests and partition specs.

I think we should add sort-orders like partition-specs, and a
default-sort-order-id like default-spec-id. This would also require
removing sort_columns from data files in the spec
<http://iceberg.apache.org/spec/#manifests> and adding sort_order_id. We
can keep file_ordinal, but probably want to add some context to know the
group of files where it is valid. We could also remove it.

Which sort orders should Iceberg support?

I agree with what’s already been said: we should use a natural order for
each type
<https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/types/Comparators.java#L31-L45>,
ascending and descending. To start, Strings must use UTF-8’s natural
ordering and we can expand from there.

Here’s what a sort order might look like:

"sort-orders": [
    { "id": 1, "ordering": [ { "source-id": 4, "ascending": true } ] },
    { "id": 2, "ordering": [ { "source-id": 4, "ascending": true }, {
"source-id": 5, "ascending": false } ] },
    { "id": 3, "ordering": [ { "transform": "zorder", "ascending":
false, "source-ids": [4, 5] } ] },
  ]


On Thu, Jul 4, 2019 at 9:46 AM Anton Okolnychyi
<aokolnyc...@apple.com.invalid> wrote:

> In order to begin prototyping, I would start with the following questions.
>
> 1) Does Iceberg need a sort spec?
>   - I would say yes
> 2) Should Iceberg allow users to define a sort spec only if the table is
> bucketed?
> - I would say no, as it seems valid to have partitioned and sorted tables.
> 3) How should Iceberg encode sort specs?
> - Option #1 is to rely on table properties, which will allow us to use
> ALTER TABLE ... SET TBLPROPERTIES to configure sorting specs. However, I am
> not sure it would be easy to encode non-trivial sort specs and track sort
> spec evolution (if needed).
> - Option #2 is to extend PartitionSpec to cover sorting as well. This
> option will allow us to use transformations to encode non-trivial sorts and
> won't require many changes to the codebase.
>   - Option #3 is to store SortSpec separately from PartitionSpec. This
> will require more changes compared to Option #2 but can also give us extra
> flexibility.
>
> Each option has its own trade-offs, but I tend to think #2 is reasonable.
>
> 4) Which sort orders should Iceberg support?
> - I think we have to be flexible and support adding more sort orders
> later. In addition to what Owen said, we can add sorting based on
> multi-dimensional space-filling curves in the future.
>
>
> What do you think?
>
> Thanks,
> Anton
>
> On 1 Jul 2019, at 18:06, Owen O'Malley <owen.omal...@gmail.com> wrote:
>
> My thought is just like Iceberg has to define partitioning and bucketing,
> it has to define a canonical sort order. In particular, we can’t afford to
> have Spark, Presto, and Hive writing files in different orders. I believe
> the right approach is to define a sort order as a series of columns where
> each column is either ascending or descending and defining the natural sort
> order for each type.
>
> The hard bit will be if we need to support non-natural sorts of strings.
> For example, if we need to support case-insensitive sorts or the different
> collations that databases support, I’d hope that we could start with the
> default of utf-8 byte ordering and expand as needed. If you are curious
> what the different collations look like -
> https://dba.stackexchange.com/questions/94887/what-is-the-impact-of-lc-ctype-on-a-postgresql-database
>  .
>
> .. Owen
>
> On Jul 1, 2019, at 4:18 AM, Anton Okolnychyi <
> aokolnyc...@apple.com.INVALID> wrote:
>
> Hey folks,
>
> Iceberg users are advised not only to partition their data but also to
> sort within partitions by columns in predicates in order to get the best
> performance. Right now, this process is mostly manual and performed by
> users before writing.
> I am wondering if we should extend Iceberg metadata so that query engines
> can do this automatically in the future. We already have `sortColumns` in
> DataFile but they are not used.
>
>    - Do we need a notion of sort columns in TableMetadata?
>    - Spark’s sort spec is tightly coupled with bucketing and cannot be
>    used alone. However, it seems reasonable to have partitioned and sorted
>    tables without bucketing. How do we see this in Iceberg?
>    - If we decide to have sort spec in the metadata, do we want to make
>    it part of PartitionSpec or have it separately?
>
> Thanks,
> Anton
>
>
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to