RE: Read schema to use during time travel

Mayur Srivastava Mon, 22 Nov 2021 16:50:38 -0800

We have some use cases where data and source code are linked. In such cases, 
time travel to a snapshot with the same snapshot’s schema is desirable because 
the source code expects the schema corresponding to the snapshot. The current 
schema may be useful in some cases but it has correctness issues (as Ryan 
mentioned earlier) and works only when the current schema (or a future schema) 
is backwards compatible with snapshot schema.

Thanks,
Mayur

From: Ryan Blue <b...@tabular.io>
Sent: Monday, November 22, 2021 6:41 PM
To: Iceberg Dev List <dev@iceberg.apache.org>
Subject: Re: Read schema to use during time travel

Thanks, Jack! I was just thinking about this because I think we want to have 
support for time travel with the snapshot schema in 0.13.0 if possible, so it's 
a great time to talk about the behavior.

In general, I think that time travel with the snapshot schema is the right 
option. The use case I think of is where I am trying to debug what was seen at 
some time -- either for a user to understand why a query produced a certain 
result, or maybe for an administrator to reproduce a failure case. Those cases 
require as close as we can get to exactly what was used.

To get more specific, say I have a table where I replace a column with a new 
one using the same name. And then I want to debug a query that happened at some 
time. If I use the current schema for that time travel, then I'll get 
completely different results -- probably no rows if I'm filtering or joining by 
the replaced column. That's really confusing behavior. Maybe there are some 
cases where we want to use the current schema, but I think users implicitly 
expect time travel to give the same results as a historical query.

Your example may justify adding an option to time travel with the current 
schema, but I'm not sure when you'd have a production app that you use with 
time travel. Wouldn't you also roll back to an older version of the query? I 
just don't see myself being surprised if an updated query doesn't work.

For the second issue, I agree with you that the current behavior is weird. But 
I think the best way to fix it is to simply create a new snapshot at the same 
time we update the schema. If we create a snapshot with the same manifest list, 
then we can do everything entirely in metadata and avoid the behavior. We can 
do this with a quick modification to the TableMetadata mutation.

If we solve the second case by adding a rewrite snapshot with no modifications, 
then I don't think we need to specify a schema. Instead, we can add a flag for 
whether to time travel with the current or snapshot schema. But, I'd wait until 
we hit a use case that requires this, since I'm skeptical about people needing 
it in practice.

Ryan

On Mon, Nov 22, 2021 at 11:52 AM Jack Ye 
<yezhao...@gmail.com<mailto:yezhao...@gmail.com>> wrote:
Hi everyone,

As we are starting to have more use cases around the time travel related 
features in Iceberg, a few questions popped up around the correct schema to use 
for time travel queries.

Latest table schema vs schema at snapshot time
There was a discussion around this in 
https://github.com/apache/iceberg/pull/1508, but I would like to just revisit 
it a bit, because in many applications the schema at read time is fixed, and 
the same schema might be desirable for querying an old snapshot. Considering 
the following case:

t0: create table sample (id bigint, category string, data string)
t1: insert into sample (1, 'c1', data1'), (2, 'c2', 'data2')
t2: alter table sample add column ts timestamp
t3: insert into sample (3, 'c3', 'dat3', timestamp '2000-01-01')

The application that reads the table might already be updated to query columns 
(id, data, ts), and just expect the value of ts to be null for data before t2 
when it performs a time travel. However, if time travel to t1 gives data in the 
old schema, it will not find the ts column and fail the query. So things get a 
lot more complex at application side because it needs to distinguish between 
the query as-of-time and use different logic to handle data. This gets very 
convoluted very quickly as more schema evolutions are added.

While I do see the great value of using schema at snapshot time in many 
applications and I am all for this feature, I think for some applications that 
means we are pushing a lot of complexity of handling multi-schema to the users. 
Maybe some kind of option is needed for users to better control the behavior to 
fit their specific use cases.

Schema for time-based travel
As of today, we have 2 time travel syntaxes (use Hive syntax as example):
- SELECT * FROM table FOR SYSTEM_TIME AS OF timestamp
- SELECT * FROM table FOR SYSTEM_VERSION AS OF version

For SYSTEM_VERSION based travel, it is pretty clear that we will time travel to 
a specific snapshot based on the snapshot ID, and the schema can be resolved by 
the solution of the last question. However, for SYSTEM_TIME based travel, there 
is ambiguity around the exact schema to use, because Iceberg does not create a 
new snapshot for table metadata operations like schema evolution.

Consider the same use case example as above, If the user queries SELECT * FROM 
table FOR SYSTEM_TIME AS OF t2, then it is ambiguous if we should use the 
schema of the snapshot at t1, or the updated schema at t2. Currently the 
behavior is to use a schema of snapshot at t1, which might not be the intention 
of the user.

Thoughts about the solution
Considering the 2 cases above, it seems to me that the most flexible way to 
resolve this is to provide users a way to override the schema to use for time 
travel queries. The default can be kept as using the snapshot schema, but 
another schema could be used if the user specifies the schema ID. This provides 
a lot of flexibility for use cases like table migration and schema evolution 
coordination across microservices.

Any thoughts around this?

Best,
Jack Ye

--
Ryan Blue
Tabular

RE: Read schema to use during time travel

Reply via email to