ed, so an
>> assumption should be that data files might not be sorted.
>>
>> Files that are sorted should indicate how they are sorted, so that
>> optimizations are applied if the file’s metadata indicates it can be safely
>> applied. For example, if both deletes a
onfiguration gets
>>>> updated, so an assumption should be that data files might not be sorted.
>>>>
>>>> Files that are sorted should indicate how they are sorted, so that
>>>> optimizations are applied if the file’s metadata indicates it can be saf
rites when a configuration gets
>>> updated, so an assumption should be that data files might not be sorted.
>>>
>>> Files that are sorted should indicate how they are sorted, so that
>>> optimizations are applied if the file’s metadata indicates it can be saf
applied if the file’s metadata indicates it can be safely
>> applied. For example, if both deletes and data rows are sorted the same
>> way, you can merge the two streams instead of using a hash set to check
>> whether a record has been deleted. I think this should rely on the de
a hash set to check
> whether a record has been deleted. I think this should rely on the delete
> file’s sort order matching the data file it is applied to.
>
> Should Iceberg allow users to define a sort spec only if the table is
> bucketed?
>
> No. In Iceberg, bucketi
ther a
> record has been deleted. I think this should rely on the delete file’s sort
> order matching the data file it is applied to.
>
> Should Iceberg allow users to define a sort spec only if the table is
> bucketed?
>
> No. In Iceberg, bucketing is just another parti
streams instead of using a hash set to check
whether a record has been deleted. I think this should rely on the delete
file’s sort order matching the data file it is applied to.
Should Iceberg allow users to define a sort spec only if the table is
bucketed?
No. In Iceberg, bucketing is just
In order to begin prototyping, I would start with the following questions.
1) Does Iceberg need a sort spec?
- I would say yes
2) Should Iceberg allow users to define a sort spec only if the table is
bucketed?
- I would say no, as it seems valid to have partitioned and sorted
hey are not used.
> Do we need a notion of sort columns in TableMetadata?
> Spark’s sort spec is tightly coupled with bucketing and cannot be used alone.
> However, it seems reasonable to have partitioned and sorted tables without
> bucketing. How do we see this in Iceberg?
> If w
metadata so that query engines can
do this automatically in the future. We already have `sortColumns` in DataFile
but they are not used.
Do we need a notion of sort columns in TableMetadata?
Spark’s sort spec is tightly coupled with bucketing and cannot be used alone.
However, it seems reasonable to
10 matches
Mail list logo