Re: Question on ordering on partitions when read

Chen Song Thu, 25 Mar 2021 08:06:04 -0700

Popping up the question.

On Wed, Mar 24, 2021 at 2:01 PM Chen Song <chen.song...@gmail.com> wrote:


> I want to clarify the ordering semantics (if deterministic) on partitions
> returned when using iceberg core data API to read.
>
> Say I define a table with a *time* column and partition by *day(time)*, and
> do the following writes.
>
> partition (day)    time                               other data fields
> 2020-10-01         2020-10-01 01:01:01    ...
> 2020-10-01         2020-10-01 02:01:01    ...
> 2020-10-02         2020-10-02 01:01:01    ...
> 2020-10-02         2020-10-02 02:01:01    ...
>
> Then if I do read all using something like the following.
>
>     IcebergGenerics.read(table).build();
>
> I did see rows returned in the right order in terms of partitions. Then if
> I append the same data again and read again. I see rows returned like.
>
> 2020-10-01         2020-10-01 01:01:01    ...
> 2020-10-01         2020-10-01 02:01:01    ...
> 2020-10-02         2020-10-02 01:01:01    ...
> 2020-10-02         2020-10-02 02:01:01    ...
> 2020-10-01         2020-10-01 01:01:01    ...
> 2020-10-01         2020-10-01 02:01:01    ...
> 2020-10-02         2020-10-02 01:01:01    ...
> 2020-10-02         2020-10-02 02:01:01    ...
>
> In other words, the rows returned in the order first by commit time then
> by partition *day*. If I want to ensure the data from partition
> 2020-10-01 is always returned before  2020-10-02 in the above example, is
> there a way to configure the reader to do that? I checked the reader API
> and cannot seem to find a method to do that.
>
> Please be noted that I am NOT talking about sorting within a partition,
> which I know that has to be enforced by the writer.
>
> --
> Chen Song
>
>

-- 
Chen Song

Re: Question on ordering on partitions when read

Reply via email to