Hi,
I have a question regarding the behavior of schema evolution with
time-travel in Iceberg.
When I do a time-travel query against a table with schema changes.
I expect that the result is structured using the schema. But it turned out
to be structured using the current schema.
Is this an expected behavior?
I think it would be nice to be able to query the data in its original
shape. What do you think?
Code snippet as follows. Environment: Iceberg 0.10.0, Spark 3.0.1
sql("create table iceberg.test.schema_timetravel (id int, name string)
using iceberg")
sql("insert into table iceberg.test.schema_timetravel values(1, 'aaa')")
sql("insert into table iceberg.test.schema_timetravel values(2, 'bbb')")
sql("select * from iceberg.test.schema_timetravel").show()
+---+-------+
| id| name|
+---+-------+
| 1| aaa|
| 2| bbb|
+---+-------+
sql("select * from iceberg.test.schema_timetravel.history").show()
+--------------------+-------------------+-------------------+-------------------+
| made_current_at| snapshot_id|
parent_id|is_current_ancestor|
+--------------------+-------------------+-------------------+-------------------+
|2020-12-14 22:44:...|2849000299888498484| null|
true|
|2020-12-14 22:44:...|5610242355805640211|2849000299888498484|
true|
+--------------------+-------------------+-------------------+-------------------+
sql("alter table iceberg.test.schema_timetravel drop column name")
sql("select * from iceberg.test.schema_timetravel").show()
+---+
| id|
+---+
| 1|
| 2|
+---+
spark.read.format("iceberg").option("snapshot-id",
2849000299888498484L).load("test.schema_timetravel").show()
// Expect: show data in the previous schema: (1, aaa)
// Result: show data in the current schema: (1)
+---+
| id|
+---+
| 1|
+---+
Best regards,
Tianyi