Thanks for the replies Ryan and Amogh,

Time travel relies on history which captures all the changes on the main
> table state.

Just to make this explicit "history" here to the "snapshot-log" in the spec?

We decided the first option is easier to understand and is what people
> expect. That way if you're debugging an old job, you get the same version
> it would have read, even if there are later changes like fast-forwarding
> the current state to a staged snapshot after validating it, or rolling back.

Makes sense, thanks for the context.

For “I assume in this case users need to query the underlying Iceberg
> metadata to determine a snapshot of interest)?” just curious how were you
> planning on doing this (bearing in mind time travel relies on history)?

Originally, I was thinking of time-travel as the second option that Ryan
mentioned, in which case it seemed like a metadata only operation.  Given
how time-travel is currently defined, this still seems doable but less
efficient by using the "metadata-log" and opening historic files but
probably not worth the effort.

Do you think it pays to add a note for implementers in the specification
that the "snapshot-log" (assuming I got the correct field) is what is used
in reference implementations for time-travel (apologies if this is already
covered and I missed it)?

Thanks,
Micah





On Tue, Apr 25, 2023 at 4:33 PM Ryan Blue <b...@tabular.io> wrote:

> Everything Amogh said is correct, but I can give a bit more context.
>
> There are two options for the behavior of time travel by timestamp. First,
> you can read the state of the table that you _would have read_ if you ran
> the query at that time. Second, you could read the ancestor of the current
> state that was "current" at that time.
>
> We decided the first option is easier to understand and is what people
> expect. That way if you're debugging an old job, you get the same version
> it would have read, even if there are later changes like fast-forwarding
> the current state to a staged snapshot after validating it, or rolling back.
>
> Ryan
>
> On Tue, Apr 25, 2023 at 3:35 PM Jahagirdar, Amogh
> <jaham...@amazon.com.invalid> wrote:
>
>> Hi Micah,
>>
>>
>>
>> Your understanding is right, as of today there is no mechanism for
>> performing time travel on branch. Time travel relies on history which
>> captures all the changes on the main table state. At present there is no
>> history metadata for branches (we can’t use snapshot lineages), for more
>> details checkout this PR comment.
>> <https://github.com/apache/iceberg/pull/5364#issuecomment-1227902420>
>>
>> For “I assume in this case users need to query the underlying Iceberg
>> metadata to determine a snapshot of interest)?” just curious how were you
>> planning on doing this (bearing in mind time travel relies on history)?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Amogh Jahagirdar
>>
>>
>>
>> *From: *Micah Kornfield <emkornfi...@gmail.com>
>> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org>
>> *Date: *Tuesday, April 25, 2023 at 3:09 PM
>> *To: *Iceberg Dev List <dev@iceberg.apache.org>
>> *Subject: *[EXTERNAL] SQL Syntax for Time Travel on a Branch?
>>
>>
>>
>> *CAUTION*: This email originated from outside of the organization. Do
>> not click links or open attachments unless you can confirm the sender and
>> know the content is safe.
>>
>>
>>
>> Looking through the documents for Spark SQL syntax [1], it appears that
>> Iceberg supports reading a branch at the latest version or time-travel on
>> the main table, but I didn't see any queries that compose the two.
>>
>>
>>
>> Is my understanding correct that there isn't existing SQL for time travel
>> on a specific branch (I assume in this case users need to query the
>> underlying Iceberg metadata to determine a snapshot of interest)?
>>
>>
>>
>> Thanks,
>>
>> Micah
>>
>>
>>
>> [1] https://iceberg.apache.org/docs/latest/spark-queries/
>>
>
>
> --
> Ryan Blue
> Tabular
>

Reply via email to