Re: Sort Spec

2019-07-19 Thread Anton Okolnychyi
ed, so an >> assumption should be that data files might not be sorted. >> >> Files that are sorted should indicate how they are sorted, so that >> optimizations are applied if the file’s metadata indicates it can be safely >> applied. For example, if both deletes a

Re: Sort Spec

2019-07-18 Thread Ryan Blue
onfiguration gets >>>> updated, so an assumption should be that data files might not be sorted. >>>> >>>> Files that are sorted should indicate how they are sorted, so that >>>> optimizations are applied if the file’s metadata indicates it can be saf

Re: Sort Spec

2019-07-18 Thread Owen O'Malley
rites when a configuration gets >>> updated, so an assumption should be that data files might not be sorted. >>> >>> Files that are sorted should indicate how they are sorted, so that >>> optimizations are applied if the file’s metadata indicates it can be saf

Re: Sort Spec

2019-07-18 Thread Ryan Blue
applied if the file’s metadata indicates it can be safely >> applied. For example, if both deletes and data rows are sorted the same >> way, you can merge the two streams instead of using a hash set to check >> whether a record has been deleted. I think this should rely on the de

Re: Sort Spec

2019-07-18 Thread Owen O'Malley
a hash set to check > whether a record has been deleted. I think this should rely on the delete > file’s sort order matching the data file it is applied to. > > Should Iceberg allow users to define a sort spec only if the table is > bucketed? > > No. In Iceberg, bucketi

Re: Sort Spec

2019-07-18 Thread Anton Okolnychyi
ther a > record has been deleted. I think this should rely on the delete file’s sort > order matching the data file it is applied to. > > Should Iceberg allow users to define a sort spec only if the table is > bucketed? > > No. In Iceberg, bucketing is just another parti

Re: Sort Spec

2019-07-16 Thread Ryan Blue
streams instead of using a hash set to check whether a record has been deleted. I think this should rely on the delete file’s sort order matching the data file it is applied to. Should Iceberg allow users to define a sort spec only if the table is bucketed? No. In Iceberg, bucketing is just

Re: Sort Spec

2019-07-04 Thread Anton Okolnychyi
In order to begin prototyping, I would start with the following questions. 1) Does Iceberg need a sort spec? - I would say yes 2) Should Iceberg allow users to define a sort spec only if the table is bucketed? - I would say no, as it seems valid to have partitioned and sorted

Re: Sort Spec

2019-07-01 Thread Owen O'Malley
hey are not used. > Do we need a notion of sort columns in TableMetadata? > Spark’s sort spec is tightly coupled with bucketing and cannot be used alone. > However, it seems reasonable to have partitioned and sorted tables without > bucketing. How do we see this in Iceberg? > If w

Sort Spec

2019-07-01 Thread Anton Okolnychyi
metadata so that query engines can do this automatically in the future. We already have `sortColumns` in DataFile but they are not used. Do we need a notion of sort columns in TableMetadata? Spark’s sort spec is tightly coupled with bucketing and cannot be used alone. However, it seems reasonable to