We have some use cases where data and source code are linked. In such cases, time travel to a snapshot with the same snapshot’s schema is desirable because the source code expects the schema corresponding to the snapshot. The current schema may be useful in some cases but it has correctness issues (as Ryan mentioned earlier) and works only when the current schema (or a future schema) is backwards compatible with snapshot schema.
Thanks, Mayur From: Ryan Blue <b...@tabular.io> Sent: Monday, November 22, 2021 6:41 PM To: Iceberg Dev List <dev@iceberg.apache.org> Subject: Re: Read schema to use during time travel Thanks, Jack! I was just thinking about this because I think we want to have support for time travel with the snapshot schema in 0.13.0 if possible, so it's a great time to talk about the behavior. In general, I think that time travel with the snapshot schema is the right option. The use case I think of is where I am trying to debug what was seen at some time -- either for a user to understand why a query produced a certain result, or maybe for an administrator to reproduce a failure case. Those cases require as close as we can get to exactly what was used. To get more specific, say I have a table where I replace a column with a new one using the same name. And then I want to debug a query that happened at some time. If I use the current schema for that time travel, then I'll get completely different results -- probably no rows if I'm filtering or joining by the replaced column. That's really confusing behavior. Maybe there are some cases where we want to use the current schema, but I think users implicitly expect time travel to give the same results as a historical query. Your example may justify adding an option to time travel with the current schema, but I'm not sure when you'd have a production app that you use with time travel. Wouldn't you also roll back to an older version of the query? I just don't see myself being surprised if an updated query doesn't work. For the second issue, I agree with you that the current behavior is weird. But I think the best way to fix it is to simply create a new snapshot at the same time we update the schema. If we create a snapshot with the same manifest list, then we can do everything entirely in metadata and avoid the behavior. We can do this with a quick modification to the TableMetadata mutation. If we solve the second case by adding a rewrite snapshot with no modifications, then I don't think we need to specify a schema. Instead, we can add a flag for whether to time travel with the current or snapshot schema. But, I'd wait until we hit a use case that requires this, since I'm skeptical about people needing it in practice. Ryan On Mon, Nov 22, 2021 at 11:52 AM Jack Ye <yezhao...@gmail.com<mailto:yezhao...@gmail.com>> wrote: Hi everyone, As we are starting to have more use cases around the time travel related features in Iceberg, a few questions popped up around the correct schema to use for time travel queries. Latest table schema vs schema at snapshot time There was a discussion around this in https://github.com/apache/iceberg/pull/1508, but I would like to just revisit it a bit, because in many applications the schema at read time is fixed, and the same schema might be desirable for querying an old snapshot. Considering the following case: t0: create table sample (id bigint, category string, data string) t1: insert into sample (1, 'c1', data1'), (2, 'c2', 'data2') t2: alter table sample add column ts timestamp t3: insert into sample (3, 'c3', 'dat3', timestamp '2000-01-01') The application that reads the table might already be updated to query columns (id, data, ts), and just expect the value of ts to be null for data before t2 when it performs a time travel. However, if time travel to t1 gives data in the old schema, it will not find the ts column and fail the query. So things get a lot more complex at application side because it needs to distinguish between the query as-of-time and use different logic to handle data. This gets very convoluted very quickly as more schema evolutions are added. While I do see the great value of using schema at snapshot time in many applications and I am all for this feature, I think for some applications that means we are pushing a lot of complexity of handling multi-schema to the users. Maybe some kind of option is needed for users to better control the behavior to fit their specific use cases. Schema for time-based travel As of today, we have 2 time travel syntaxes (use Hive syntax as example): - SELECT * FROM table FOR SYSTEM_TIME AS OF timestamp - SELECT * FROM table FOR SYSTEM_VERSION AS OF version For SYSTEM_VERSION based travel, it is pretty clear that we will time travel to a specific snapshot based on the snapshot ID, and the schema can be resolved by the solution of the last question. However, for SYSTEM_TIME based travel, there is ambiguity around the exact schema to use, because Iceberg does not create a new snapshot for table metadata operations like schema evolution. Consider the same use case example as above, If the user queries SELECT * FROM table FOR SYSTEM_TIME AS OF t2, then it is ambiguous if we should use the schema of the snapshot at t1, or the updated schema at t2. Currently the behavior is to use a schema of snapshot at t1, which might not be the intention of the user. Thoughts about the solution Considering the 2 cases above, it seems to me that the most flexible way to resolve this is to provide users a way to override the schema to use for time travel queries. The default can be kept as using the snapshot schema, but another schema could be used if the user specifies the schema ID. This provides a lot of flexibility for use cases like table migration and schema evolution coordination across microservices. Any thoughts around this? Best, Jack Ye -- Ryan Blue Tabular