Thanks Ryan and Kyle for your input! I am glad we are directionally aligned
on the importance and convenience of this feature.

Regarding the choice between table history versus snapshot creation times,
table history makes a lot of sense, since it captures a change log. This
aligns with the current implementation of asOfTime() too.

I think the result of a SQL query on timestamp (or snapshot ID) ranges
should be the changes that took place on the table during that time,
ideally capturing insertions and deletions, which brings us to Ryan's great
point about rollbacks, which I think can be modeled as deletions. For
example, let us assume the following sequence of events (hopefully it
reflects Ryan's example):

Snapshot A is committed at timestamp 1
Snapshot B is committed at timestamp 2
Snapshot C is committed at timestamp 3
Table is rolled back to Snapshot B at timestamp 4

The table history would look like:
(Snapshot ID, event time)
(A, 1)
(B, 2)
(C, 3)
(B, 4)

The rollback issue arises when someone queries the table for VERSIONS
BETWEEN SYSTEM TIME 3 AND 4, which is linear in time, but translates to
snapshot range (C, B) from the table history. (C, B) may appear to be an
unacceptable range (since B is created before C); however, that can be
considered acceptable if we treat reverse snapshot ranges as delete
operations. In that case, even if anything was consumed from timestamp 2 to
timestamp 3, it will now be taken back (deleted) when consuming from
timestamp 3 to timestamp 4.

That leads to the question of how to model a SQL query like SELECT * FROM T
VERSIONS BETWEEN SYSTEM TIME ... AND ... from the user and the engine
points of view.

As stated above, I think the answer is that the result should capture the
changes that happened between those times, and also have extra metadata in
the result (e.g., extra column) to tell if those changes were inserts or
deletes. That, of course, means that engines need to support those
semantics in their data source APIs too. Does that sound reasonable?

Thanks,
Walaa.


On Sat, Jan 8, 2022 at 12:58 AM Kyle Bendickson <k...@tabular.io> wrote:

> Thank you Ryan for summarizing that so well.
>
> I'm in agreement that it's too convenient to simply ignore due to those
> caveats, though they are admittedly potentially large caveats.
>
> However, some people don't interact with their table that way and I see
> discussion around ways to implement incremental scans often. I agree this
> syntax is reasonable and that it's suefuk enough to implement despite the
> challenges.
>
> I believe there's been some previous work on this. Without looking at the
> other thread again, I do think we should either resurrect that previous
> work or try to prioritize this.
>
> Especially with certain other convenience mechanisms coming up such as
> snapshots and branches on the horizon, I would hopefully get this in sooner
> than later so that it's existence can be considered as these new features
> are developed.
>
> - Kyle (kbendick)
>
>
> On Fri, Jan 7, 2022 at 4:35 PM Ryan Blue <b...@tabular.io> wrote:
>
>> Walaa,
>>
>> At supporting syntax for VERSIONS BETWEEN SYSTEM TIME ... AND ... seems
>> reasonable to me. I think it’s often really nice to be able to select the
>> changes between two points in time for debugging. It would also be nice to
>> be able to do the same for snapshot IDs, so you could reliably use similar
>> syntax for incremental consumption.
>>
>> There are some challenges with implementing VERSIONS BETWEEN ... that I
>> want to highlight, though. First, the FOR SYSTEM_TIME syntax produces
>> the result that would have been returned if you ran the query at that time.
>> That uses the table history instead of snapshot creation timestamps. If you
>> update the table and then roll back to a previous snapshot, FOR
>> SYSTEM_TIME will read the rolled back snapshot if the timestamp is in
>> the interval when it was the table’s current state.
>>
>> We will need to decide whether VERSIONS BETWEEN ... uses table history
>> or snapshot creation timestamps. If it uses table history, there may be
>> intervals that don’t have a linear history and the query would fail. If it
>> uses snapshot creation timestamps, then you’d be able to select intervals
>> that never really existed, like between commits in a transaction.
>>
>> So there are major issues with timestamps. We would want to make it clear
>> that timestamps are for convenience, and not for incremental consumption
>> (because of the issue from that thread) and may not reflect the actual
>> table state (or may fail). That said, it’s a really convenient feature and
>> I would support adding both VERSIONS BETWEEN with timestamp and snapshot
>> ID.
>>
>> Ryan
>>
>> On Thu, Jan 6, 2022 at 9:52 PM Walaa Eldin Moustafa <
>> wa.moust...@gmail.com> wrote:
>>
>>> Hi Iceberg devs,
>>>
>>> We have been considering the problem of Time-sliced incremental scan
>>> (i.e., reading data that is committed between two timestamps), and I ran
>>> into this thread [1] in the Iceberg dev mailing list. The summary of the
>>> thread is that incremental scan should leverage snapshot IDs as opposed to
>>> timestamps since there is no guarantee that commit timestamps are linear.
>>>
>>> I wanted to follow up on that discussion to see if folks are open to
>>> still supporting time-sliced incremental scan APIs with the caveat above.
>>> The reasons are many fold:
>>>
>>> * Time-slice APIs are more human friendly and simplify the need for a
>>> state store to track last read snapshot IDs. While some sort of state
>>> recording may be required in some cases, it is not always the case,
>>> especially if data consumption happens at regular intervals.
>>>
>>> * I understand the caveat discussed in the thread still applies to the
>>> existing "TIMESTAMP AS OF" API, yet the API is supported. By extension, I
>>> think it is fair to extend the support to time-sliced incremental scan.
>>>
>>> * Iceberg already provides a deterministic function to translate a
>>> timestamp to a snapshot ID [2]. Since a table can be scanned by snapshot
>>> range, I think it is reasonable to allow scanning by timestamp range, since
>>> the translation mechanism already exists.
>>>
>>> * Incremental scan using timestamp range is part of the SQL Standard
>>> (e.g. "VERSIONS BETWEEN SYSTEM TIME ... AND ...") and is supported by some
>>> existing engines. See [3] for SQL Server support.
>>>
>>> * Conceptually, it is possible to implement the same query semantics at
>>> the SQL level using existing APIs and operators such as "TIMESTAMP AS OF"
>>> and "EXCEPT" (e.g., by selecting: (T as of timestamp1) EXCEPT (T as of
>>> timestamp2)). It sounds that incremental scan is an optimization to push
>>> the differencing operation to the data source as opposed to letting the
>>> engine deal with it, so it is better to implement the SQL shorthand, and
>>> its corresponding data source optimization.
>>>
>>> Therefore, I would like to ask if we can proceed with supporting an API
>>> along the lines of "VERSIONS BETWEEN SYSTEM TIME ... AND ...", and consider
>>> the discussion in [1] as a caveat that folks using this API (or existing
>>> timestamp APIs) have to keep in mind, until the implementation guarantees
>>> linearity at some point in the future.
>>>
>>> Thanks,
>>> Walaa.
>>>
>>> [1] https://www.mail-archive.com/dev@iceberg.apache.org/msg01504.html
>>> [2]
>>> https://github.com/apache/iceberg/blob/1a11038baaca9f4cf03674654a6d9a85f39a1ce1/core/src/main/java/org/apache/iceberg/util/SnapshotUtil.java#L265
>>> [3]
>>> https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-tables?view=sql-server-ver15
>>>
>>
>>
>> --
>> Ryan Blue
>> Tabular
>>
>

Reply via email to