Re: About schema evolution with time travel.

Tianyi Wang Mon, 14 Dec 2020 18:03:11 -0800

Hi Wing Yew,

Thanks for the pointer to the PR. That's what I was looking for.
I will watch #1508 and #1029 and let's continue discussing on Github.


Best,
Tianyi

On Tue, Dec 15, 2020 at 3:44 AM Wing Yew Poon <wyp...@cloudera.com.invalid>
wrote:

> Hi Tianyi,
> The behavior you found is indeed the current behavior in Iceberg. I too
> found it unexpected. I have a PR to address this:
> https://github.com/apache/iceberg/pull/1508. Due to other work, I had not
> followed up on this for a while, but I am returning to it now.
> - Wing Yew
>
>
> On Mon, Dec 14, 2020 at 6:27 AM Cap Kurmagati <capkurmag...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I have a question regarding the behavior of schema evolution with
>> time-travel in Iceberg.
>> When I do a time-travel query against a table with schema changes.
>> I expect that the result is structured using the schema. But it turned
>> out to be structured using the current schema.
>>
>> Is this an expected behavior?
>> I think it would be nice to be able to query the data in its original
>> shape. What do you think?
>>
>> Code snippet as follows. Environment: Iceberg 0.10.0, Spark 3.0.1
>>
>> sql("create table iceberg.test.schema_timetravel (id int, name string)
>> using iceberg")
>> sql("insert into table iceberg.test.schema_timetravel values(1, 'aaa')")
>> sql("insert into table iceberg.test.schema_timetravel values(2, 'bbb')")
>> sql("select * from iceberg.test.schema_timetravel").show()
>> +---+-------+
>> | id|   name|
>> +---+-------+
>> |  1|    aaa|
>> |  2|    bbb|
>> +---+-------+
>> sql("select * from iceberg.test.schema_timetravel.history").show()
>>
>> +--------------------+-------------------+-------------------+-------------------+
>> |     made_current_at|        snapshot_id|
>>  parent_id|is_current_ancestor|
>>
>> +--------------------+-------------------+-------------------+-------------------+
>> |2020-12-14 22:44:...|2849000299888498484|               null|
>>     true|
>> |2020-12-14 22:44:...|5610242355805640211|2849000299888498484|
>>     true|
>>
>> +--------------------+-------------------+-------------------+-------------------+
>> sql("alter table iceberg.test.schema_timetravel drop column name")
>> sql("select * from iceberg.test.schema_timetravel").show()
>> +---+
>> | id|
>> +---+
>> |  1|
>> |  2|
>> +---+
>> spark.read.format("iceberg").option("snapshot-id",
>> 2849000299888498484L).load("test.schema_timetravel").show()
>> // Expect: show data in the previous schema: (1, aaa)
>> // Result: show data in the current schema: (1)
>> +---+
>> | id|
>> +---+
>> |  1|
>> +---+
>>
>> Best regards,
>> Tianyi
>>
>

Re: About schema evolution with time travel.

Reply via email to