Re: [DISCUSS] Change Behavior for SchemaUpdate.UnionByName

Fokko Driesprong Fri, 01 Nov 2024 03:01:43 -0700

Hey Rocco,

Thanks for raising this. I don't have any strong feelings about this, and I
agree with Russell that it should not throw an exception.


I guess there was no strong reason behind how it is today, but it's just
because we leverage the UpdateSchema API, which raises an exception when
doing the downcast.

Also, on the Python side, it will throw an exception when you take a union
of an int and long, but it is pretty straightforward to loosen that
requirement: https://github.com/apache/iceberg-python/pull/1283
<https://github.com/apache/iceberg-python/pull/1283>

Kind regards,
Fokko

Op do 31 okt 2024 om 19:00 schreef Russell Spitzer <
russell.spit...@gmail.com>:

> I'm in favor of 1 since previously these inputs would have thrown an
> exception that wasn't really that helpful.
>
> @Test
> public void testDowncastoLongToInt() {
>   Schema currentSchema = new Schema(required(1, "aCol", LongType.get()));
>   Schema newSchema = new Schema(required(1, "aCol", IntegerType.get()));
>
>   assertThatThrownBy(() -> new SchemaUpdate(currentSchema, 
> 1).unionByNameWith(newSchema).apply());
> }
>
> I think removing states in which the API would fail is good and although
> we didn't document it exactly this way before, I'm having a hard time
> thinking of a case in which throwing an exception here would have been
> preferable to a noop? Does anyone else have strong feelings around this?
>
>
> On Thu, Oct 31, 2024 at 12:40 PM Rocco Varela <rocco.var...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> Apologize if this is landing twice, my first attempt got lost somewhere
>> in transit :)
>>
>> I have a PR that attempts to address
>> https://github.com/apache/iceberg/issues/4849, basically adding logic to
>> ignore downcasting column types when "mergeSchema" is set when an existing
>> column type is long and the new schema has an int type for the same column.
>>
>> My solution involves updates to UnionByNameVisitor and this may end up
>> changing the behavior of our public api in a way that hasn't previously
>> been documented.
>>
>> Questions raised during the review is whether we should do one of the
>> following:
>>
>>    1. Update our docs in UpdateSchema.unionByNameWith
>>    <http://updateschema.unionbynamewith/> and callout something like "We
>>    ignore differences in type if the new type is narrower than the existing
>>    type", or
>>    2. We add a new api UpdateSchema.unionByNameWith(Schema newSchema,
>>    boolean ignoreTypeNarrowing)
>>
>>
>> Any feedback would be appreciated, thanks for your time.
>>
>> Cheers,
>>
>> --Rocco
>>
>

Re: [DISCUSS] Change Behavior for SchemaUpdate.UnionByName

Reply via email to