Hi Jack, Thanks for bringing up this discussion. but another schema could be used if the user specifies the schema ID.
I think it is hard for users to specify the schema id. I think we don't even have a metadata table to see how schema is evolved per operation. So, Unless user knows which table metadata file maps to which operation, it will be hard to find schema id. +1 for Ryan's suggestion of creating the new snapshot file during schema change operation. I would also consider creating a new snapshot file for partition evolution and sort order evolution operations as well. Even though this won't be needed for time travel. It can help our snapshot tagging and branching feature to know branch's latest schema id, spec id and sort order I guess. Thanks, Ajantha On Tue, Nov 23, 2021 at 6:29 AM Jack Ye <yezhao...@gmail.com> wrote: > Thanks for the reply Ryan, and thanks for the additional information Mayur! > > Yes, the example you described is one of the biggest reasons for using the > snapshot schema. I just brought the other example up because that was the > conflicting use case I heard. The use case is that the system would always > query a stable snapshot by referring to a specific snapshot ID. And it > could perform a much faster rollback to an older version of the snapshot > instead of rolling back the entire app if something is wrong with the new > data. > > I agree that adding an option to use the current schema sounds like the > approach to go for that specific use case, my proposal was trying to have a > fix that also works for the second issue. But if we are considering > committing a snapshot for a metadata operation, then that would be the best > way to solve the second issue and we are good here. > > Best, > Jack Ye > > On Mon, Nov 22, 2021 at 3:41 PM Ryan Blue <b...@tabular.io> wrote: > >> Thanks, Jack! I was just thinking about this because I think we want to >> have support for time travel with the snapshot schema in 0.13.0 if >> possible, so it's a great time to talk about the behavior. >> >> In general, I think that time travel with the snapshot schema is the >> right option. The use case I think of is where I am trying to debug what >> was seen at some time -- either for a user to understand why a query >> produced a certain result, or maybe for an administrator to reproduce a >> failure case. Those cases require as close as we can get to exactly what >> was used. >> >> To get more specific, say I have a table where I replace a column with a >> new one using the same name. And then I want to debug a query that happened >> at some time. If I use the current schema for that time travel, then I'll >> get completely different results -- probably no rows if I'm filtering or >> joining by the replaced column. That's really confusing behavior. Maybe >> there are some cases where we want to use the current schema, but I think >> users implicitly expect time travel to give the same results as a >> historical query. >> >> Your example may justify adding an option to time travel with the current >> schema, but I'm not sure when you'd have a production app that you use with >> time travel. Wouldn't you also roll back to an older version of the query? >> I just don't see myself being surprised if an updated query doesn't work. >> >> For the second issue, I agree with you that the current behavior is >> weird. But I think the best way to fix it is to simply create a new >> snapshot at the same time we update the schema. If we create a snapshot >> with the same manifest list, then we can do everything entirely in metadata >> and avoid the behavior. We can do this with a quick modification to the >> TableMetadata mutation. >> >> If we solve the second case by adding a rewrite snapshot with no >> modifications, then I don't think we need to specify a schema. Instead, we >> can add a flag for whether to time travel with the current or snapshot >> schema. But, I'd wait until we hit a use case that requires this, since I'm >> skeptical about people needing it in practice. >> >> Ryan >> >> On Mon, Nov 22, 2021 at 11:52 AM Jack Ye <yezhao...@gmail.com> wrote: >> >>> Hi everyone, >>> >>> As we are starting to have more use cases around the time travel related >>> features in Iceberg, a few questions popped up around the correct schema to >>> use for time travel queries. >>> >>> *Latest table schema vs schema at snapshot time* >>> There was a discussion around this in >>> https://github.com/apache/iceberg/pull/1508, but I would like to just >>> revisit it a bit, because in many applications the schema at read time is >>> fixed, and the same schema might be desirable for querying an old snapshot. >>> Considering the following case: >>> >>> t0: create table sample (id bigint, category string, data string) >>> t1: insert into sample (1, 'c1', data1'), (2, 'c2', 'data2') >>> t2: alter table sample add column ts timestamp >>> t3: insert into sample (3, 'c3', 'dat3', timestamp '2000-01-01') >>> >>> The application that reads the table might already be updated to query >>> columns (id, data, ts), and just expect the value of ts to be null for data >>> before t2 when it performs a time travel. However, if time travel to t1 >>> gives data in the old schema, it will not find the ts column and fail the >>> query. So things get a lot more complex at application side because it >>> needs to distinguish between the query as-of-time and use different logic >>> to handle data. This gets very convoluted very quickly as more schema >>> evolutions are added. >>> >>> While I do see the great value of using schema at snapshot time in many >>> applications and I am all for this feature, I think for some applications >>> that means we are pushing a lot of complexity of handling multi-schema to >>> the users. Maybe some kind of option is needed for users to better control >>> the behavior to fit their specific use cases. >>> >>> *Schema for time-based travel* >>> As of today, we have 2 time travel syntaxes (use Hive syntax as example): >>> - SELECT * FROM table FOR SYSTEM_TIME AS OF timestamp >>> - SELECT * FROM table FOR SYSTEM_VERSION AS OF version >>> >>> For SYSTEM_VERSION based travel, it is pretty clear that we will time >>> travel to a specific snapshot based on the snapshot ID, and the schema can >>> be resolved by the solution of the last question. However, for SYSTEM_TIME >>> based travel, there is ambiguity around the exact schema to use, because >>> Iceberg does not create a new snapshot for table metadata operations like >>> schema evolution. >>> >>> Consider the same use case example as above, If the user queries SELECT >>> * FROM table FOR SYSTEM_TIME AS OF t2, then it is ambiguous if we should >>> use the schema of the snapshot at t1, or the updated schema at t2. >>> Currently the behavior is to use a schema of snapshot at t1, which might >>> not be the intention of the user. >>> >>> *Thoughts about the solution* >>> Considering the 2 cases above, it seems to me that the most flexible way >>> to resolve this is to provide users a way to override the schema to use for >>> time travel queries. The default can be kept as using the snapshot schema, >>> but another schema could be used if the user specifies the schema ID. This >>> provides a lot of flexibility for use cases like table migration and schema >>> evolution coordination across microservices. >>> >>> Any thoughts around this? >>> >>> Best, >>> Jack Ye >>> >>> >>> >> >> -- >> Ryan Blue >> Tabular >> >