Re: [QUESTION] What type promotion actually means

Micah Kornfield Thu, 02 Oct 2025 04:36:32 -0700

As an update Nicolae did a first pass of reviewing
https://github.com/apache/iceberg/pull/14004/files (Thank you).  Any
additional feedback is appreciated.


Thanks,
Micah

On Sun, Sep 7, 2025 at 9:53 PM Micah Kornfield <[email protected]>
wrote:

> https://github.com/apache/iceberg/pull/14004 is an attempt to update
> based on Ryan's feedback.
>
>
> Nicolae some thoughts below:
>
> Interpreting this for the Iceberg REST Catalog interaction: writers
>> should include the assert-current-schema-id requirement when
>> committing to a table. Otherwise, inconsistencies may be introduced. I
>> might be on a slightly outdated Iceberg build, but from what I can
>> see, Spark with Iceberg doesn’t include this requirement during table
>> commits (example:
>> https://gist.github.com/nvartolomei/2fd6e994d1d9b5d61597c16339100518).
>
>
> As outlined in the new PR I believe this depends on:
> 1.  Isolation level
> 2.  The operation being performed.
>
> Ignoring changing nullable fields to non-nullable (which isn't covered in
> the spec), I think there are times when using out of date schemas is fine
> as long as one is not touching default value or one is using Snapshot
> isolation. More details in the PR, please let me know if it makes sense.
>
> Do we consider this an implementation bug then?
>
>
> I'm not familiar enough with the implementation to say.
>
> Also, what should a sensible implementation do when the requirement
>> fails? Reload table schema and check if new schema is compatible with
>> writer schema (i.e. it wouldn't be subject to the possible
>> inconsistencies as described by the spec) and otherwise rewrite the
>> data? For example rewrite parquet files by adding new all-null fields
>> if needed, etc.
>
>
> This seems like the sensible thing to do for serialized isolation.
>
> Cheers,
> Micah
>
> On Fri, Sep 5, 2025 at 10:55 AM Micah Kornfield <[email protected]>
> wrote:
>
>> https://github.com/apache/iceberg/pull/14002 for revert, I'll post a new
>> PR which try to accommodate the replies on the thread.
>>
>> On Fri, Sep 5, 2025 at 8:53 AM Ryan Blue <[email protected]> wrote:
>>
>>> Would anyone object to reverting this change? I don't think that it
>>> makes accurate statements. For example, I don't think that "not writing out
>>> all columns or not using the latest schema can change the semantics of the
>>> data written" is correct. It is perfectly fine to append with an older
>>> schema and writers do this today. Saying that may "change the semantics of
>>> the data written" makes this much scarier than it actually is.
>>>
>>> I think this confuses cases and errs too strongly on discouraging
>>> reasonable things. It is, of course, bad to use an older schema that drops
>>> columns from rows that are being updated. But that's a problem with the
>>> engine choosing to drop current data columns more than it is a schema
>>> selection issue.
>>>
>>> I think that the right path here is to state that operations should be
>>> bound to the current schema when they are planned. It is okay to continue
>>> producing data with an older schema in currently-running jobs. And we
>>> should call out the behavior of defaults. But I don't think that we should
>>> use language that clearly states what will happen rather than a generic
>>> "may change the semantics of data written".
>>>
>>> On Fri, Sep 5, 2025 at 5:05 AM Nicolae Vartolomei
>>> <[email protected]> wrote:
>>>
>>>> Quoting the newly added text in the spec:
>>>>
>>>> > Writers must write out all fields with the types specified from a
>>>> schema present in table metadata. Writers should use the latest schema for
>>>> writing. Not writing out all columns or not using the latest schema can
>>>> change the semantics of the data written. The following are possible
>>>> inconsistencies that can be introduced:
>>>>
>>>> Interpreting this for the Iceberg REST Catalog interaction: writers
>>>> should include the assert-current-schema-id requirement when
>>>> committing to a table. Otherwise, inconsistencies may be introduced. I
>>>> might be on a slightly outdated Iceberg build, but from what I can
>>>> see, Spark with Iceberg doesn’t include this requirement during table
>>>> commits (example:
>>>> https://gist.github.com/nvartolomei/2fd6e994d1d9b5d61597c16339100518).
>>>>
>>>> Do we consider this an implementation bug then?
>>>>
>>>> Also, what should a sensible implementation do when the requirement
>>>> fails? Reload table schema and check if new schema is compatible with
>>>> writer schema (i.e. it wouldn't be subject to the possible
>>>> inconsistencies as described by the spec) and otherwise rewrite the
>>>> data? For example rewrite parquet files by adding new all-null fields
>>>> if needed, etc.
>>>>
>>>> On Thu, Sep 4, 2025 at 10:19 PM Micah Kornfield <[email protected]>
>>>> wrote:
>>>> >
>>>> > Just to follow-up the PR ended up getting merged.  In theory, there
>>>> maybe should have been a vote, but unless someone feels strongly or would
>>>> like to fine tune the language further perhaps we can consider the topic
>>>> resolved?
>>>> >
>>>> > On Wed, Aug 27, 2025 at 2:17 PM Micah Kornfield <
>>>> [email protected]> wrote:
>>>> >>
>>>> >> I opened https://github.com/apache/iceberg/pull/13936 as a draft
>>>> proposal to capture the conversation.
>>>> >>
>>>> >> BTW, I think one area this brings up that I don't think the
>>>> specification handles is changing between nullable and not-nullable fields.
>>>> Outdated schemas have some implications in these cases as well.
>>>> >>
>>>> >> Cheers,
>>>> >> Micah
>>>> >>
>>>> >> On Tue, Aug 26, 2025 at 10:13 AM Micah Kornfield <
>>>> [email protected]> wrote:
>>>> >>>
>>>> >>> I think the original question is ambiguous.  We should probably
>>>> subset this into two questions:
>>>> >>>
>>>> >>> 1.  Is it OK to write out an "int" instead of a "long" if the
>>>> writer's schema says the value is a long?
>>>> >>>
>>>> >>> I think the answer here is we recommended not doing so, even though
>>>> it would likely work.
>>>> >>>
>>>> >>> 2. Is it OK to use an older schema for writing?
>>>> >>>
>>>> >>> The consensus on the thread seems to be yes.  I'll note that this
>>>> can cause confusing results when the "write-default" [1] value for a column
>>>> changes.  We should probably have an implementation note to clarify:
>>>> >>> a.  Using a stale schema is allowed
>>>> >>> b.  It might cause inconsistent results in the face of multiple
>>>> writers when default values are used.
>>>> >>>
>>>> >>> Thoughts?
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Micah
>>>> >>>
>>>> >>> On Mon, Aug 25, 2025 at 4:59 PM Ryan Blue <[email protected]> wrote:
>>>> >>>>
>>>> >>>> I agree with Dan that type promotion should be well-defined. If
>>>> it's a grey area then we should clarify it in the spec.
>>>> >>>>
>>>> >>>> How it works today is that schema evolution always produces a
>>>> schema that can read files written with any older schema. When a type is
>>>> promoted, the new schema can read any older data file, but readers may need
>>>> to promote values like the [int-to-long reader](
>>>> https://github.com/apache/iceberg/blob/main/parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java#L546-L560)
>>>> does. You aren't guaranteed to be able to read new data using an older
>>>> schema, so the latest schema should always be used or you should use the
>>>> schema attached to a snapshot.
>>>> >>>>
>>>> >>>> Because files with older schemas can always be read, it is safe to
>>>> write files with an older schema. This happens fairly regularly, as Steven
>>>> noted, in cases where a writer has a fixed schema and is long-running.
>>>> >>>>
>>>> >>>> Ryan
>>>> >>>>
>>>> >>>> On Thu, Aug 21, 2025 at 5:37 PM Steven Wu <[email protected]>
>>>> wrote:
>>>> >>>>>
>>>> >>>>> > This means that you can have writers using different schema to
>>>> write (use cases include different partitioning or "out-of-date" writers),
>>>> but the data is still valid.
>>>> >>>>>
>>>> >>>>> +1 on Dan's point. Both batch and streaming writers can have
>>>> stale schema. long-running streaming jobs may stay stale for extended
>>>> periods before picking up the new schema during restart.
>>>> >>>>>
>>>> >>>>> On Wed, Aug 20, 2025 at 2:50 PM Daniel Weeks <[email protected]>
>>>> wrote:
>>>> >>>>>>
>>>> >>>>>> I think I'm going to disagree and argue that it's not really a
>>>> gray area.
>>>> >>>>>>
>>>> >>>>>> Having strict schema evolution rules and how schema's are
>>>> tracked means that there is independence between writer and reader schemas
>>>> which remain compatible due to the evolution rules.
>>>> >>>>>>
>>>> >>>>>> This means that you can have writers using different schema to
>>>> write (use cases include different partitioning or "out-of-date" writers),
>>>> but the data is still valid.
>>>> >>>>>>
>>>> >>>>>> How you promote physical representation during a read/scan
>>>> operation results in a consistent presentation with the read schema.
>>>> >>>>>>
>>>> >>>>>> All of the representations are technically valid.
>>>> >>>>>>
>>>> >>>>>> -Dan
>>>> >>>>>>
>>>> >>>>>> On Mon, Aug 18, 2025 at 7:46 AM Russell Spitzer <
>>>> [email protected]> wrote:
>>>> >>>>>>>
>>>> >>>>>>> +1 to what Micah said :) sorry about the typo
>>>> >>>>>>>
>>>> >>>>>>> On Mon, Aug 18, 2025 at 9:45 AM Russell Spitzer <
>>>> [email protected]> wrote:
>>>> >>>>>>>>
>>>> >>>>>>>> +1 to what Micaah , We have never really written rules about
>>>> what is "allowed" in this particular context but since
>>>> >>>>>>>> a reader needs to be able to handle both int/long values for
>>>> the column, there isn't really any danger in writing
>>>> >>>>>>>> new files with the narrower type. If a reader couldn't handle
>>>> this, then type promotion would be impossible.
>>>> >>>>>>>>
>>>> >>>>>>>> I would include all columns in the file, the space
>>>> requirements for an all null column (or all constant column) should
>>>> >>>>>>>> be very small. I believe the reason we original wrote those
>>>> rules in was to avoid folks doing the Hive Style
>>>> >>>>>>>> implicit columns from partition tuple (although we also have
>>>> handling for this.)
>>>> >>>>>>>>
>>>> >>>>>>>> On Sun, Aug 17, 2025 at 11:15 PM Micah Kornfield <
>>>> [email protected]> wrote:
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>  Hi Nic,
>>>> >>>>>>>>> This is IMO a gray area.
>>>> >>>>>>>>>
>>>> >>>>>>>>>> However, is it allowed to commit *new* parquet files with
>>>> the old
>>>> >>>>>>>>>> types (int) and commit them to the table with a table schema
>>>> where
>>>> >>>>>>>>>> types are promoted (long)?
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> IMO  I would expect writers to be writing files that are
>>>> consistent with the current metadata, so ideally they would not be written
>>>> with int if it is now long.  In general, though in these cases I think most
>>>> readers are robust to reading type promoted files.  We should probably
>>>> clarify in the specification.
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>>> Also, is it allowed to commit parquet files, in general,
>>>> which contain
>>>> >>>>>>>>>> only a subset of columns of table schema? I.e. if I know a
>>>> column is
>>>> >>>>>>>>>> all NULLs, can we just skip writing it?
>>>> >>>>>>>>>
>>>> >>>>>>>>>
>>>> >>>>>>>>> As currently worded the spec on writing data files (
>>>> https://iceberg.apache.org/spec/#writing-data-files) should include
>>>> all columns. Based on column projection rules, however, failing to do so
>>>> should also not cause problems.
>>>> >>>>>>>>>
>>>> >>>>>>>>> Cheers,
>>>> >>>>>>>>> Micah
>>>> >>>>>>>>>
>>>> >>>>>>>>> On Fri, Aug 15, 2025 at 8:45 AM Nicolae Vartolomei
>>>> <[email protected]> wrote:
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Hi,
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> I'm implementing an Iceberg writer[^1] and have a question
>>>> about what
>>>> >>>>>>>>>> type promotion actually means as part of schema evolution
>>>> rules.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Iceberg spec [specifies][spec-evo] which type promotions are
>>>> allowed.
>>>> >>>>>>>>>> No confusion there.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> The confusion on my end arises when it comes to actually
>>>> writing i.e.
>>>> >>>>>>>>>> parquet data. Let's take for example the int to long
>>>> promotion. What
>>>> >>>>>>>>>> is actually allowed under this promotion rule? Let me try to
>>>> show what
>>>> >>>>>>>>>> I mean.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Obviously if I have a schema-id N with field A of type int
>>>> and table
>>>> >>>>>>>>>> snapshots with this schema then it is possible to update the
>>>> table
>>>> >>>>>>>>>> schema-id to > N where field A now has type long and this
>>>> new schema
>>>> >>>>>>>>>> can read parquet files with the old type.
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> However, is it allowed to commit *new* parquet files with
>>>> the old
>>>> >>>>>>>>>> types (int) and commit them to the table with a table schema
>>>> where
>>>> >>>>>>>>>> types are promoted (long)?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Also, is it allowed to commit parquet files, in general,
>>>> which contain
>>>> >>>>>>>>>> only a subset of columns of table schema? I.e. if I know a
>>>> column is
>>>> >>>>>>>>>> all NULLs, can we just skip writing it?
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> Appreciate taking the time to look at this,
>>>> >>>>>>>>>> Nic
>>>> >>>>>>>>>>
>>>> >>>>>>>>>> [spec-evo]:
>>>> https://iceberg.apache.org/spec/#schema-evolution
>>>> >>>>>>>>>> [^1]: This is for Redpanda to Iceberg native integration
>>>> >>>>>>>>>> (https://github.com/redpanda-data/redpanda).
>>>>
>>>

Re: [QUESTION] What type promotion actually means

Reply via email to