Hi,

I have a question regarding the behavior of schema evolution with
time-travel in Iceberg.
When I do a time-travel query against a table with schema changes.
I expect that the result is structured using the schema. But it turned out
to be structured using the current schema.

Is this an expected behavior?
I think it would be nice to be able to query the data in its original
shape. What do you think?

Code snippet as follows. Environment: Iceberg 0.10.0, Spark 3.0.1

sql("create table iceberg.test.schema_timetravel (id int, name string)
using iceberg")
sql("insert into table iceberg.test.schema_timetravel values(1, 'aaa')")
sql("insert into table iceberg.test.schema_timetravel values(2, 'bbb')")
sql("select * from iceberg.test.schema_timetravel").show()
+---+-------+
| id|   name|
+---+-------+
|  1|    aaa|
|  2|    bbb|
+---+-------+
sql("select * from iceberg.test.schema_timetravel.history").show()
+--------------------+-------------------+-------------------+-------------------+
|     made_current_at|        snapshot_id|
 parent_id|is_current_ancestor|
+--------------------+-------------------+-------------------+-------------------+
|2020-12-14 22:44:...|2849000299888498484|               null|
  true|
|2020-12-14 22:44:...|5610242355805640211|2849000299888498484|
  true|
+--------------------+-------------------+-------------------+-------------------+
sql("alter table iceberg.test.schema_timetravel drop column name")
sql("select * from iceberg.test.schema_timetravel").show()
+---+
| id|
+---+
|  1|
|  2|
+---+
spark.read.format("iceberg").option("snapshot-id",
2849000299888498484L).load("test.schema_timetravel").show()
// Expect: show data in the previous schema: (1, aaa)
// Result: show data in the current schema: (1)
+---+
| id|
+---+
|  1|
+---+

Best regards,
Tianyi

Reply via email to