Is there anything that Iceberg needs to do differently here? We've had
requests to support reordering fields with `ADD COLUMN ... AFTER other_col`
and `UPDATE COLUMN col BEFORE other_col`. Otherwise, do you think we need
to change the internal checks?

On Thu, Sep 26, 2019 at 1:23 AM Gautam <gautamkows...@gmail.com> wrote:

> Shone and I synced offline but wanted to circle back here so others can
> hopefully benefit and others with more experience with this can correct me
> if there's a better way to achieve this.
>
> *Problem*:
>   The use case  is that incoming data has fields out of order w.r.t
> already ingested data in Iceberg. This same scenario applies to nested
> columns as well (e.g. fields in a sub-struct has fields out of order) .
> Also Incoming data might have added fields. Issue is if data is ingested as
> is  Iceberg will complain with it's compatibility checks. As it should.
>
> *Solution*:
>   Iceberg doesn't depend on field names nor natural order of fields. It
> uses Ids to keep track of schema fields. So if one wants to
> enforce evolution rules correctly she should first go back to the
> underlying Iceberg schema and apply schema transformation rules using
> Iceberg Schema Update Api and commit the schema changes to the underlying
> table. Once this is done Iceberg will have created a new version of the
> schema with new Ids allotted to the added fields. It also accounts for
> different order in the incoming data as it keeps the id-name mapping for
> all columns.
>
> Here is a gist that captures these scenarios described above with sample
> data : https://gist.github.com/prodeezy/b2cc35b87fca7d43ae681d45b3d7cab3
>
> Cheers,
> -Gautam.
>
>
>
>
>
>
>
> On Wed, Sep 25, 2019 at 5:29 AM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
>
>> Hi Shone,
>>
>> Iceberg should be able to handle out of order data columns in nested
>> structures. We probably just need to relax that compatibility check to
>> allow it. Can you post the error message that you're getting?
>>
>> On Sun, Sep 22, 2019 at 4:49 AM Shone Sadler <ssad...@adobe.com.invalid>
>> wrote:
>>
>>> Hello everyone,
>>>
>>> This question is related to schema evolution support in Iceberg.
>>>
>>> We have data coming in with fields out-of-order wrt to the schema in
>>> Iceberg (e.g. inbound struct(a,b,c) vs. iceberg struct(c,b,a))
>>>
>>> As a result we are hitting the following error in Iceberg when saving
>>> the data  -> "Cannot write incompatible dataset to table with schema",
>>> generated within the IcebergeSource ->
>>> https://github.com/apache/incubator-iceberg/blob/d1f0b540f5f14f002be86133ef9f66445f7e0926/spark/src/main/java/org/apache/iceberg/spark/source/IcebergSource.java#L157
>>>
>>> I also noted in the documentation that re-ordering was allowed ->
>>> https://iceberg.apache.org/evolution/ , which led me to believe that we
>>> could update the schema prior to writing the data, However, I see no means
>>> of re-ordering fields on the current UpdateSchema API.
>>>
>>> How are people handling out-of-order fields today?
>>>
>>> Our data is deeply nested, as a result I am hoping not to have to
>>> transform/prep on ingest and looking for alternatives.
>>>
>>> Any thoughts appreciated!
>>>
>>> Regards,
>>> Shone Sadler
>>>
>>>
>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to