Re: [QUESTION] What type promotion actually means

Micah Kornfield Mon, 13 Oct 2025 14:02:58 -0700

Just wanted to bump this.  Ryan, since you had objections to the regular PR
would you mind taking a look before we start a vote?


Thanks,
Micah

On Tue, Sep 30, 2025 at 10:01 AM Micah Kornfield <[email protected]>
wrote:

> As an update Nicolae did a first pass of reviewing
> https://github.com/apache/iceberg/pull/14004/files (Thank you).  Any
> additional feedback is appreciated.
>
> Thanks,
> Micah
>
> On Sun, Sep 7, 2025 at 9:53 PM Micah Kornfield <[email protected]>
> wrote:
>
>> https://github.com/apache/iceberg/pull/14004 is an attempt to update
>> based on Ryan's feedback.
>>
>>
>> Nicolae some thoughts below:
>>
>> Interpreting this for the Iceberg REST Catalog interaction: writers
>>> should include the assert-current-schema-id requirement when
>>> committing to a table. Otherwise, inconsistencies may be introduced. I
>>> might be on a slightly outdated Iceberg build, but from what I can
>>> see, Spark with Iceberg doesn’t include this requirement during table
>>> commits (example:
>>> https://gist.github.com/nvartolomei/2fd6e994d1d9b5d61597c16339100518).
>>
>>
>> As outlined in the new PR I believe this depends on:
>> 1.  Isolation level
>> 2.  The operation being performed.
>>
>> Ignoring changing nullable fields to non-nullable (which isn't covered in
>> the spec), I think there are times when using out of date schemas is fine
>> as long as one is not touching default value or one is using Snapshot
>> isolation. More details in the PR, please let me know if it makes sense.
>>
>> Do we consider this an implementation bug then?
>>
>>
>> I'm not familiar enough with the implementation to say.
>>
>> Also, what should a sensible implementation do when the requirement
>>> fails? Reload table schema and check if new schema is compatible with
>>> writer schema (i.e. it wouldn't be subject to the possible
>>> inconsistencies as described by the spec) and otherwise rewrite the
>>> data? For example rewrite parquet files by adding new all-null fields
>>> if needed, etc.
>>
>>
>> This seems like the sensible thing to do for serialized isolation.
>>
>> Cheers,
>> Micah
>>
>> On Fri, Sep 5, 2025 at 10:55 AM Micah Kornfield <[email protected]>
>> wrote:
>>
>>> https://github.com/apache/iceberg/pull/14002 for revert, I'll post a
>>> new PR which try to accommodate the replies on the thread.
>>>
>>> On Fri, Sep 5, 2025 at 8:53 AM Ryan Blue <[email protected]> wrote:
>>>
>>>> Would anyone object to reverting this change? I don't think that it
>>>> makes accurate statements. For example, I don't think that "not writing out
>>>> all columns or not using the latest schema can change the semantics of the
>>>> data written" is correct. It is perfectly fine to append with an older
>>>> schema and writers do this today. Saying that may "change the semantics of
>>>> the data written" makes this much scarier than it actually is.
>>>>
>>>> I think this confuses cases and errs too strongly on discouraging
>>>> reasonable things. It is, of course, bad to use an older schema that drops
>>>> columns from rows that are being updated. But that's a problem with the
>>>> engine choosing to drop current data columns more than it is a schema
>>>> selection issue.
>>>>
>>>> I think that the right path here is to state that operations should be
>>>> bound to the current schema when they are planned. It is okay to continue
>>>> producing data with an older schema in currently-running jobs. And we
>>>> should call out the behavior of defaults. But I don't think that we should
>>>> use language that clearly states what will happen rather than a generic
>>>> "may change the semantics of data written".
>>>>
>>>> On Fri, Sep 5, 2025 at 5:05 AM Nicolae Vartolomei
>>>> <[email protected]> wrote:
>>>>
>>>>> Quoting the newly added text in the spec:
>>>>>
>>>>> > Writers must write out all fields with the types specified from a
>>>>> schema present in table metadata. Writers should use the latest schema for
>>>>> writing. Not writing out all columns or not using the latest schema can
>>>>> change the semantics of the data written. The following are possible
>>>>> inconsistencies that can be introduced:
>>>>>
>>>>> Interpreting this for the Iceberg REST Catalog interaction: writers
>>>>> should include the assert-current-schema-id requirement when
>>>>> committing to a table. Otherwise, inconsistencies may be introduced. I
>>>>> might be on a slightly outdated Iceberg build, but from what I can
>>>>> see, Spark with Iceberg doesn’t include this requirement during table
>>>>> commits (example:
>>>>> https://gist.github.com/nvartolomei/2fd6e994d1d9b5d61597c16339100518).
>>>>>
>>>>> Do we consider this an implementation bug then?
>>>>>
>>>>> Also, what should a sensible implementation do when the requirement
>>>>> fails? Reload table schema and check if new schema is compatible with
>>>>> writer schema (i.e. it wouldn't be subject to the possible
>>>>> inconsistencies as described by the spec) and otherwise rewrite the
>>>>> data? For example rewrite parquet files by adding new all-null fields
>>>>> if needed, etc.
>>>>>
>>>>> On Thu, Sep 4, 2025 at 10:19 PM Micah Kornfield <[email protected]>
>>>>> wrote:
>>>>> >
>>>>> > Just to follow-up the PR ended up getting merged.  In theory, there
>>>>> maybe should have been a vote, but unless someone feels strongly or would
>>>>> like to fine tune the language further perhaps we can consider the topic
>>>>> resolved?
>>>>> >
>>>>> > On Wed, Aug 27, 2025 at 2:17 PM Micah Kornfield <
>>>>> [email protected]> wrote:
>>>>> >>
>>>>> >> I opened https://github.com/apache/iceberg/pull/13936 as a draft
>>>>> proposal to capture the conversation.
>>>>> >>
>>>>> >> BTW, I think one area this brings up that I don't think the
>>>>> specification handles is changing between nullable and not-nullable 
>>>>> fields.
>>>>> Outdated schemas have some implications in these cases as well.
>>>>> >>
>>>>> >> Cheers,
>>>>> >> Micah
>>>>> >>
>>>>> >> On Tue, Aug 26, 2025 at 10:13 AM Micah Kornfield <
>>>>> [email protected]> wrote:
>>>>> >>>
>>>>> >>> I think the original question is ambiguous.  We should probably
>>>>> subset this into two questions:
>>>>> >>>
>>>>> >>> 1.  Is it OK to write out an "int" instead of a "long" if the
>>>>> writer's schema says the value is a long?
>>>>> >>>
>>>>> >>> I think the answer here is we recommended not doing so, even
>>>>> though it would likely work.
>>>>> >>>
>>>>> >>> 2. Is it OK to use an older schema for writing?
>>>>> >>>
>>>>> >>> The consensus on the thread seems to be yes.  I'll note that this
>>>>> can cause confusing results when the "write-default" [1] value for a 
>>>>> column
>>>>> changes.  We should probably have an implementation note to clarify:
>>>>> >>> a.  Using a stale schema is allowed
>>>>> >>> b.  It might cause inconsistent results in the face of multiple
>>>>> writers when default values are used.
>>>>> >>>
>>>>> >>> Thoughts?
>>>>> >>>
>>>>> >>> Thanks,
>>>>> >>> Micah
>>>>> >>>
>>>>> >>> On Mon, Aug 25, 2025 at 4:59 PM Ryan Blue <[email protected]>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>> I agree with Dan that type promotion should be well-defined. If
>>>>> it's a grey area then we should clarify it in the spec.
>>>>> >>>>
>>>>> >>>> How it works today is that schema evolution always produces a
>>>>> schema that can read files written with any older schema. When a type is
>>>>> promoted, the new schema can read any older data file, but readers may 
>>>>> need
>>>>> to promote values like the [int-to-long reader](
>>>>> https://github.com/apache/iceberg/blob/main/parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java#L546-L560)
>>>>> does. You aren't guaranteed to be able to read new data using an older
>>>>> schema, so the latest schema should always be used or you should use the
>>>>> schema attached to a snapshot.
>>>>> >>>>
>>>>> >>>> Because files with older schemas can always be read, it is safe
>>>>> to write files with an older schema. This happens fairly regularly, as
>>>>> Steven noted, in cases where a writer has a fixed schema and is
>>>>> long-running.
>>>>> >>>>
>>>>> >>>> Ryan
>>>>> >>>>
>>>>> >>>> On Thu, Aug 21, 2025 at 5:37 PM Steven Wu <[email protected]>
>>>>> wrote:
>>>>> >>>>>
>>>>> >>>>> > This means that you can have writers using different schema to
>>>>> write (use cases include different partitioning or "out-of-date" writers),
>>>>> but the data is still valid.
>>>>> >>>>>
>>>>> >>>>> +1 on Dan's point. Both batch and streaming writers can have
>>>>> stale schema. long-running streaming jobs may stay stale for extended
>>>>> periods before picking up the new schema during restart.
>>>>> >>>>>
>>>>> >>>>> On Wed, Aug 20, 2025 at 2:50 PM Daniel Weeks <[email protected]>
>>>>> wrote:
>>>>> >>>>>>
>>>>> >>>>>> I think I'm going to disagree and argue that it's not really a
>>>>> gray area.
>>>>> >>>>>>
>>>>> >>>>>> Having strict schema evolution rules and how schema's are
>>>>> tracked means that there is independence between writer and reader schemas
>>>>> which remain compatible due to the evolution rules.
>>>>> >>>>>>
>>>>> >>>>>> This means that you can have writers using different schema to
>>>>> write (use cases include different partitioning or "out-of-date" writers),
>>>>> but the data is still valid.
>>>>> >>>>>>
>>>>> >>>>>> How you promote physical representation during a read/scan
>>>>> operation results in a consistent presentation with the read schema.
>>>>> >>>>>>
>>>>> >>>>>> All of the representations are technically valid.
>>>>> >>>>>>
>>>>> >>>>>> -Dan
>>>>> >>>>>>
>>>>> >>>>>> On Mon, Aug 18, 2025 at 7:46 AM Russell Spitzer <
>>>>> [email protected]> wrote:
>>>>> >>>>>>>
>>>>> >>>>>>> +1 to what Micah said :) sorry about the typo
>>>>> >>>>>>>
>>>>> >>>>>>> On Mon, Aug 18, 2025 at 9:45 AM Russell Spitzer <
>>>>> [email protected]> wrote:
>>>>> >>>>>>>>
>>>>> >>>>>>>> +1 to what Micaah , We have never really written rules about
>>>>> what is "allowed" in this particular context but since
>>>>> >>>>>>>> a reader needs to be able to handle both int/long values for
>>>>> the column, there isn't really any danger in writing
>>>>> >>>>>>>> new files with the narrower type. If a reader couldn't handle
>>>>> this, then type promotion would be impossible.
>>>>> >>>>>>>>
>>>>> >>>>>>>> I would include all columns in the file, the space
>>>>> requirements for an all null column (or all constant column) should
>>>>> >>>>>>>> be very small. I believe the reason we original wrote those
>>>>> rules in was to avoid folks doing the Hive Style
>>>>> >>>>>>>> implicit columns from partition tuple (although we also have
>>>>> handling for this.)
>>>>> >>>>>>>>
>>>>> >>>>>>>> On Sun, Aug 17, 2025 at 11:15 PM Micah Kornfield <
>>>>> [email protected]> wrote:
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>  Hi Nic,
>>>>> >>>>>>>>> This is IMO a gray area.
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>> However, is it allowed to commit *new* parquet files with
>>>>> the old
>>>>> >>>>>>>>>> types (int) and commit them to the table with a table
>>>>> schema where
>>>>> >>>>>>>>>> types are promoted (long)?
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> IMO  I would expect writers to be writing files that are
>>>>> consistent with the current metadata, so ideally they would not be written
>>>>> with int if it is now long.  In general, though in these cases I think 
>>>>> most
>>>>> readers are robust to reading type promoted files.  We should probably
>>>>> clarify in the specification.
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>> Also, is it allowed to commit parquet files, in general,
>>>>> which contain
>>>>> >>>>>>>>>> only a subset of columns of table schema? I.e. if I know a
>>>>> column is
>>>>> >>>>>>>>>> all NULLs, can we just skip writing it?
>>>>> >>>>>>>>>
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> As currently worded the spec on writing data files (
>>>>> https://iceberg.apache.org/spec/#writing-data-files) should include
>>>>> all columns. Based on column projection rules, however, failing to do so
>>>>> should also not cause problems.
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> Cheers,
>>>>> >>>>>>>>> Micah
>>>>> >>>>>>>>>
>>>>> >>>>>>>>> On Fri, Aug 15, 2025 at 8:45 AM Nicolae Vartolomei
>>>>> <[email protected]> wrote:
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Hi,
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> I'm implementing an Iceberg writer[^1] and have a question
>>>>> about what
>>>>> >>>>>>>>>> type promotion actually means as part of schema evolution
>>>>> rules.
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Iceberg spec [specifies][spec-evo] which type promotions
>>>>> are allowed.
>>>>> >>>>>>>>>> No confusion there.
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> The confusion on my end arises when it comes to actually
>>>>> writing i.e.
>>>>> >>>>>>>>>> parquet data. Let's take for example the int to long
>>>>> promotion. What
>>>>> >>>>>>>>>> is actually allowed under this promotion rule? Let me try
>>>>> to show what
>>>>> >>>>>>>>>> I mean.
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Obviously if I have a schema-id N with field A of type int
>>>>> and table
>>>>> >>>>>>>>>> snapshots with this schema then it is possible to update
>>>>> the table
>>>>> >>>>>>>>>> schema-id to > N where field A now has type long and this
>>>>> new schema
>>>>> >>>>>>>>>> can read parquet files with the old type.
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> However, is it allowed to commit *new* parquet files with
>>>>> the old
>>>>> >>>>>>>>>> types (int) and commit them to the table with a table
>>>>> schema where
>>>>> >>>>>>>>>> types are promoted (long)?
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Also, is it allowed to commit parquet files, in general,
>>>>> which contain
>>>>> >>>>>>>>>> only a subset of columns of table schema? I.e. if I know a
>>>>> column is
>>>>> >>>>>>>>>> all NULLs, can we just skip writing it?
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> Appreciate taking the time to look at this,
>>>>> >>>>>>>>>> Nic
>>>>> >>>>>>>>>>
>>>>> >>>>>>>>>> [spec-evo]:
>>>>> https://iceberg.apache.org/spec/#schema-evolution
>>>>> >>>>>>>>>> [^1]: This is for Redpanda to Iceberg native integration
>>>>> >>>>>>>>>> (https://github.com/redpanda-data/redpanda).
>>>>>
>>>>

Re: [QUESTION] What type promotion actually means

Reply via email to