Thanks, Jack! I was just thinking about this because I think we want to have support for time travel with the snapshot schema in 0.13.0 if possible, so it's a great time to talk about the behavior.
In general, I think that time travel with the snapshot schema is the right option. The use case I think of is where I am trying to debug what was seen at some time -- either for a user to understand why a query produced a certain result, or maybe for an administrator to reproduce a failure case. Those cases require as close as we can get to exactly what was used. To get more specific, say I have a table where I replace a column with a new one using the same name. And then I want to debug a query that happened at some time. If I use the current schema for that time travel, then I'll get completely different results -- probably no rows if I'm filtering or joining by the replaced column. That's really confusing behavior. Maybe there are some cases where we want to use the current schema, but I think users implicitly expect time travel to give the same results as a historical query. Your example may justify adding an option to time travel with the current schema, but I'm not sure when you'd have a production app that you use with time travel. Wouldn't you also roll back to an older version of the query? I just don't see myself being surprised if an updated query doesn't work. For the second issue, I agree with you that the current behavior is weird. But I think the best way to fix it is to simply create a new snapshot at the same time we update the schema. If we create a snapshot with the same manifest list, then we can do everything entirely in metadata and avoid the behavior. We can do this with a quick modification to the TableMetadata mutation. If we solve the second case by adding a rewrite snapshot with no modifications, then I don't think we need to specify a schema. Instead, we can add a flag for whether to time travel with the current or snapshot schema. But, I'd wait until we hit a use case that requires this, since I'm skeptical about people needing it in practice. Ryan On Mon, Nov 22, 2021 at 11:52 AM Jack Ye <yezhao...@gmail.com> wrote: > Hi everyone, > > As we are starting to have more use cases around the time travel related > features in Iceberg, a few questions popped up around the correct schema to > use for time travel queries. > > *Latest table schema vs schema at snapshot time* > There was a discussion around this in > https://github.com/apache/iceberg/pull/1508, but I would like to just > revisit it a bit, because in many applications the schema at read time is > fixed, and the same schema might be desirable for querying an old snapshot. > Considering the following case: > > t0: create table sample (id bigint, category string, data string) > t1: insert into sample (1, 'c1', data1'), (2, 'c2', 'data2') > t2: alter table sample add column ts timestamp > t3: insert into sample (3, 'c3', 'dat3', timestamp '2000-01-01') > > The application that reads the table might already be updated to query > columns (id, data, ts), and just expect the value of ts to be null for data > before t2 when it performs a time travel. However, if time travel to t1 > gives data in the old schema, it will not find the ts column and fail the > query. So things get a lot more complex at application side because it > needs to distinguish between the query as-of-time and use different logic > to handle data. This gets very convoluted very quickly as more schema > evolutions are added. > > While I do see the great value of using schema at snapshot time in many > applications and I am all for this feature, I think for some applications > that means we are pushing a lot of complexity of handling multi-schema to > the users. Maybe some kind of option is needed for users to better control > the behavior to fit their specific use cases. > > *Schema for time-based travel* > As of today, we have 2 time travel syntaxes (use Hive syntax as example): > - SELECT * FROM table FOR SYSTEM_TIME AS OF timestamp > - SELECT * FROM table FOR SYSTEM_VERSION AS OF version > > For SYSTEM_VERSION based travel, it is pretty clear that we will time > travel to a specific snapshot based on the snapshot ID, and the schema can > be resolved by the solution of the last question. However, for SYSTEM_TIME > based travel, there is ambiguity around the exact schema to use, because > Iceberg does not create a new snapshot for table metadata operations like > schema evolution. > > Consider the same use case example as above, If the user queries SELECT * > FROM table FOR SYSTEM_TIME AS OF t2, then it is ambiguous if we should use > the schema of the snapshot at t1, or the updated schema at t2. Currently > the behavior is to use a schema of snapshot at t1, which might not be the > intention of the user. > > *Thoughts about the solution* > Considering the 2 cases above, it seems to me that the most flexible way > to resolve this is to provide users a way to override the schema to use for > time travel queries. The default can be kept as using the snapshot schema, > but another schema could be used if the user specifies the schema ID. This > provides a lot of flexibility for use cases like table migration and schema > evolution coordination across microservices. > > Any thoughts around this? > > Best, > Jack Ye > > > -- Ryan Blue Tabular