Thanks for the replies Ryan and Amogh, Time travel relies on history which captures all the changes on the main > table state.
Just to make this explicit "history" here to the "snapshot-log" in the spec? We decided the first option is easier to understand and is what people > expect. That way if you're debugging an old job, you get the same version > it would have read, even if there are later changes like fast-forwarding > the current state to a staged snapshot after validating it, or rolling back. Makes sense, thanks for the context. For “I assume in this case users need to query the underlying Iceberg > metadata to determine a snapshot of interest)?” just curious how were you > planning on doing this (bearing in mind time travel relies on history)? Originally, I was thinking of time-travel as the second option that Ryan mentioned, in which case it seemed like a metadata only operation. Given how time-travel is currently defined, this still seems doable but less efficient by using the "metadata-log" and opening historic files but probably not worth the effort. Do you think it pays to add a note for implementers in the specification that the "snapshot-log" (assuming I got the correct field) is what is used in reference implementations for time-travel (apologies if this is already covered and I missed it)? Thanks, Micah On Tue, Apr 25, 2023 at 4:33 PM Ryan Blue <b...@tabular.io> wrote: > Everything Amogh said is correct, but I can give a bit more context. > > There are two options for the behavior of time travel by timestamp. First, > you can read the state of the table that you _would have read_ if you ran > the query at that time. Second, you could read the ancestor of the current > state that was "current" at that time. > > We decided the first option is easier to understand and is what people > expect. That way if you're debugging an old job, you get the same version > it would have read, even if there are later changes like fast-forwarding > the current state to a staged snapshot after validating it, or rolling back. > > Ryan > > On Tue, Apr 25, 2023 at 3:35 PM Jahagirdar, Amogh > <jaham...@amazon.com.invalid> wrote: > >> Hi Micah, >> >> >> >> Your understanding is right, as of today there is no mechanism for >> performing time travel on branch. Time travel relies on history which >> captures all the changes on the main table state. At present there is no >> history metadata for branches (we can’t use snapshot lineages), for more >> details checkout this PR comment. >> <https://github.com/apache/iceberg/pull/5364#issuecomment-1227902420> >> >> For “I assume in this case users need to query the underlying Iceberg >> metadata to determine a snapshot of interest)?” just curious how were you >> planning on doing this (bearing in mind time travel relies on history)? >> >> >> >> Thanks, >> >> >> >> Amogh Jahagirdar >> >> >> >> *From: *Micah Kornfield <emkornfi...@gmail.com> >> *Reply-To: *"dev@iceberg.apache.org" <dev@iceberg.apache.org> >> *Date: *Tuesday, April 25, 2023 at 3:09 PM >> *To: *Iceberg Dev List <dev@iceberg.apache.org> >> *Subject: *[EXTERNAL] SQL Syntax for Time Travel on a Branch? >> >> >> >> *CAUTION*: This email originated from outside of the organization. Do >> not click links or open attachments unless you can confirm the sender and >> know the content is safe. >> >> >> >> Looking through the documents for Spark SQL syntax [1], it appears that >> Iceberg supports reading a branch at the latest version or time-travel on >> the main table, but I didn't see any queries that compose the two. >> >> >> >> Is my understanding correct that there isn't existing SQL for time travel >> on a specific branch (I assume in this case users need to query the >> underlying Iceberg metadata to determine a snapshot of interest)? >> >> >> >> Thanks, >> >> Micah >> >> >> >> [1] https://iceberg.apache.org/docs/latest/spark-queries/ >> > > > -- > Ryan Blue > Tabular >