As an update Nicolae did a first pass of reviewing https://github.com/apache/iceberg/pull/14004/files (Thank you). Any additional feedback is appreciated.
Thanks, Micah On Sun, Sep 7, 2025 at 9:53 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > https://github.com/apache/iceberg/pull/14004 is an attempt to update > based on Ryan's feedback. > > > Nicolae some thoughts below: > > Interpreting this for the Iceberg REST Catalog interaction: writers >> should include the assert-current-schema-id requirement when >> committing to a table. Otherwise, inconsistencies may be introduced. I >> might be on a slightly outdated Iceberg build, but from what I can >> see, Spark with Iceberg doesn’t include this requirement during table >> commits (example: >> https://gist.github.com/nvartolomei/2fd6e994d1d9b5d61597c16339100518). > > > As outlined in the new PR I believe this depends on: > 1. Isolation level > 2. The operation being performed. > > Ignoring changing nullable fields to non-nullable (which isn't covered in > the spec), I think there are times when using out of date schemas is fine > as long as one is not touching default value or one is using Snapshot > isolation. More details in the PR, please let me know if it makes sense. > > Do we consider this an implementation bug then? > > > I'm not familiar enough with the implementation to say. > > Also, what should a sensible implementation do when the requirement >> fails? Reload table schema and check if new schema is compatible with >> writer schema (i.e. it wouldn't be subject to the possible >> inconsistencies as described by the spec) and otherwise rewrite the >> data? For example rewrite parquet files by adding new all-null fields >> if needed, etc. > > > This seems like the sensible thing to do for serialized isolation. > > Cheers, > Micah > > On Fri, Sep 5, 2025 at 10:55 AM Micah Kornfield <emkornfi...@gmail.com> > wrote: > >> https://github.com/apache/iceberg/pull/14002 for revert, I'll post a new >> PR which try to accommodate the replies on the thread. >> >> On Fri, Sep 5, 2025 at 8:53 AM Ryan Blue <rdb...@gmail.com> wrote: >> >>> Would anyone object to reverting this change? I don't think that it >>> makes accurate statements. For example, I don't think that "not writing out >>> all columns or not using the latest schema can change the semantics of the >>> data written" is correct. It is perfectly fine to append with an older >>> schema and writers do this today. Saying that may "change the semantics of >>> the data written" makes this much scarier than it actually is. >>> >>> I think this confuses cases and errs too strongly on discouraging >>> reasonable things. It is, of course, bad to use an older schema that drops >>> columns from rows that are being updated. But that's a problem with the >>> engine choosing to drop current data columns more than it is a schema >>> selection issue. >>> >>> I think that the right path here is to state that operations should be >>> bound to the current schema when they are planned. It is okay to continue >>> producing data with an older schema in currently-running jobs. And we >>> should call out the behavior of defaults. But I don't think that we should >>> use language that clearly states what will happen rather than a generic >>> "may change the semantics of data written". >>> >>> On Fri, Sep 5, 2025 at 5:05 AM Nicolae Vartolomei >>> <n...@nvartolomei.com.invalid> wrote: >>> >>>> Quoting the newly added text in the spec: >>>> >>>> > Writers must write out all fields with the types specified from a >>>> schema present in table metadata. Writers should use the latest schema for >>>> writing. Not writing out all columns or not using the latest schema can >>>> change the semantics of the data written. The following are possible >>>> inconsistencies that can be introduced: >>>> >>>> Interpreting this for the Iceberg REST Catalog interaction: writers >>>> should include the assert-current-schema-id requirement when >>>> committing to a table. Otherwise, inconsistencies may be introduced. I >>>> might be on a slightly outdated Iceberg build, but from what I can >>>> see, Spark with Iceberg doesn’t include this requirement during table >>>> commits (example: >>>> https://gist.github.com/nvartolomei/2fd6e994d1d9b5d61597c16339100518). >>>> >>>> Do we consider this an implementation bug then? >>>> >>>> Also, what should a sensible implementation do when the requirement >>>> fails? Reload table schema and check if new schema is compatible with >>>> writer schema (i.e. it wouldn't be subject to the possible >>>> inconsistencies as described by the spec) and otherwise rewrite the >>>> data? For example rewrite parquet files by adding new all-null fields >>>> if needed, etc. >>>> >>>> On Thu, Sep 4, 2025 at 10:19 PM Micah Kornfield <emkornfi...@gmail.com> >>>> wrote: >>>> > >>>> > Just to follow-up the PR ended up getting merged. In theory, there >>>> maybe should have been a vote, but unless someone feels strongly or would >>>> like to fine tune the language further perhaps we can consider the topic >>>> resolved? >>>> > >>>> > On Wed, Aug 27, 2025 at 2:17 PM Micah Kornfield < >>>> emkornfi...@gmail.com> wrote: >>>> >> >>>> >> I opened https://github.com/apache/iceberg/pull/13936 as a draft >>>> proposal to capture the conversation. >>>> >> >>>> >> BTW, I think one area this brings up that I don't think the >>>> specification handles is changing between nullable and not-nullable fields. >>>> Outdated schemas have some implications in these cases as well. >>>> >> >>>> >> Cheers, >>>> >> Micah >>>> >> >>>> >> On Tue, Aug 26, 2025 at 10:13 AM Micah Kornfield < >>>> emkornfi...@gmail.com> wrote: >>>> >>> >>>> >>> I think the original question is ambiguous. We should probably >>>> subset this into two questions: >>>> >>> >>>> >>> 1. Is it OK to write out an "int" instead of a "long" if the >>>> writer's schema says the value is a long? >>>> >>> >>>> >>> I think the answer here is we recommended not doing so, even though >>>> it would likely work. >>>> >>> >>>> >>> 2. Is it OK to use an older schema for writing? >>>> >>> >>>> >>> The consensus on the thread seems to be yes. I'll note that this >>>> can cause confusing results when the "write-default" [1] value for a column >>>> changes. We should probably have an implementation note to clarify: >>>> >>> a. Using a stale schema is allowed >>>> >>> b. It might cause inconsistent results in the face of multiple >>>> writers when default values are used. >>>> >>> >>>> >>> Thoughts? >>>> >>> >>>> >>> Thanks, >>>> >>> Micah >>>> >>> >>>> >>> On Mon, Aug 25, 2025 at 4:59 PM Ryan Blue <rdb...@gmail.com> wrote: >>>> >>>> >>>> >>>> I agree with Dan that type promotion should be well-defined. If >>>> it's a grey area then we should clarify it in the spec. >>>> >>>> >>>> >>>> How it works today is that schema evolution always produces a >>>> schema that can read files written with any older schema. When a type is >>>> promoted, the new schema can read any older data file, but readers may need >>>> to promote values like the [int-to-long reader]( >>>> https://github.com/apache/iceberg/blob/main/parquet/src/main/java/org/apache/iceberg/parquet/ParquetValueReaders.java#L546-L560) >>>> does. You aren't guaranteed to be able to read new data using an older >>>> schema, so the latest schema should always be used or you should use the >>>> schema attached to a snapshot. >>>> >>>> >>>> >>>> Because files with older schemas can always be read, it is safe to >>>> write files with an older schema. This happens fairly regularly, as Steven >>>> noted, in cases where a writer has a fixed schema and is long-running. >>>> >>>> >>>> >>>> Ryan >>>> >>>> >>>> >>>> On Thu, Aug 21, 2025 at 5:37 PM Steven Wu <stevenz...@gmail.com> >>>> wrote: >>>> >>>>> >>>> >>>>> > This means that you can have writers using different schema to >>>> write (use cases include different partitioning or "out-of-date" writers), >>>> but the data is still valid. >>>> >>>>> >>>> >>>>> +1 on Dan's point. Both batch and streaming writers can have >>>> stale schema. long-running streaming jobs may stay stale for extended >>>> periods before picking up the new schema during restart. >>>> >>>>> >>>> >>>>> On Wed, Aug 20, 2025 at 2:50 PM Daniel Weeks <dwe...@apache.org> >>>> wrote: >>>> >>>>>> >>>> >>>>>> I think I'm going to disagree and argue that it's not really a >>>> gray area. >>>> >>>>>> >>>> >>>>>> Having strict schema evolution rules and how schema's are >>>> tracked means that there is independence between writer and reader schemas >>>> which remain compatible due to the evolution rules. >>>> >>>>>> >>>> >>>>>> This means that you can have writers using different schema to >>>> write (use cases include different partitioning or "out-of-date" writers), >>>> but the data is still valid. >>>> >>>>>> >>>> >>>>>> How you promote physical representation during a read/scan >>>> operation results in a consistent presentation with the read schema. >>>> >>>>>> >>>> >>>>>> All of the representations are technically valid. >>>> >>>>>> >>>> >>>>>> -Dan >>>> >>>>>> >>>> >>>>>> On Mon, Aug 18, 2025 at 7:46 AM Russell Spitzer < >>>> russell.spit...@gmail.com> wrote: >>>> >>>>>>> >>>> >>>>>>> +1 to what Micah said :) sorry about the typo >>>> >>>>>>> >>>> >>>>>>> On Mon, Aug 18, 2025 at 9:45 AM Russell Spitzer < >>>> russell.spit...@gmail.com> wrote: >>>> >>>>>>>> >>>> >>>>>>>> +1 to what Micaah , We have never really written rules about >>>> what is "allowed" in this particular context but since >>>> >>>>>>>> a reader needs to be able to handle both int/long values for >>>> the column, there isn't really any danger in writing >>>> >>>>>>>> new files with the narrower type. If a reader couldn't handle >>>> this, then type promotion would be impossible. >>>> >>>>>>>> >>>> >>>>>>>> I would include all columns in the file, the space >>>> requirements for an all null column (or all constant column) should >>>> >>>>>>>> be very small. I believe the reason we original wrote those >>>> rules in was to avoid folks doing the Hive Style >>>> >>>>>>>> implicit columns from partition tuple (although we also have >>>> handling for this.) >>>> >>>>>>>> >>>> >>>>>>>> On Sun, Aug 17, 2025 at 11:15 PM Micah Kornfield < >>>> emkornfi...@gmail.com> wrote: >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> Hi Nic, >>>> >>>>>>>>> This is IMO a gray area. >>>> >>>>>>>>> >>>> >>>>>>>>>> However, is it allowed to commit *new* parquet files with >>>> the old >>>> >>>>>>>>>> types (int) and commit them to the table with a table schema >>>> where >>>> >>>>>>>>>> types are promoted (long)? >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> IMO I would expect writers to be writing files that are >>>> consistent with the current metadata, so ideally they would not be written >>>> with int if it is now long. In general, though in these cases I think most >>>> readers are robust to reading type promoted files. We should probably >>>> clarify in the specification. >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>>> Also, is it allowed to commit parquet files, in general, >>>> which contain >>>> >>>>>>>>>> only a subset of columns of table schema? I.e. if I know a >>>> column is >>>> >>>>>>>>>> all NULLs, can we just skip writing it? >>>> >>>>>>>>> >>>> >>>>>>>>> >>>> >>>>>>>>> As currently worded the spec on writing data files ( >>>> https://iceberg.apache.org/spec/#writing-data-files) should include >>>> all columns. Based on column projection rules, however, failing to do so >>>> should also not cause problems. >>>> >>>>>>>>> >>>> >>>>>>>>> Cheers, >>>> >>>>>>>>> Micah >>>> >>>>>>>>> >>>> >>>>>>>>> On Fri, Aug 15, 2025 at 8:45 AM Nicolae Vartolomei >>>> <n...@nvartolomei.com.invalid> wrote: >>>> >>>>>>>>>> >>>> >>>>>>>>>> Hi, >>>> >>>>>>>>>> >>>> >>>>>>>>>> I'm implementing an Iceberg writer[^1] and have a question >>>> about what >>>> >>>>>>>>>> type promotion actually means as part of schema evolution >>>> rules. >>>> >>>>>>>>>> >>>> >>>>>>>>>> Iceberg spec [specifies][spec-evo] which type promotions are >>>> allowed. >>>> >>>>>>>>>> No confusion there. >>>> >>>>>>>>>> >>>> >>>>>>>>>> The confusion on my end arises when it comes to actually >>>> writing i.e. >>>> >>>>>>>>>> parquet data. Let's take for example the int to long >>>> promotion. What >>>> >>>>>>>>>> is actually allowed under this promotion rule? Let me try to >>>> show what >>>> >>>>>>>>>> I mean. >>>> >>>>>>>>>> >>>> >>>>>>>>>> Obviously if I have a schema-id N with field A of type int >>>> and table >>>> >>>>>>>>>> snapshots with this schema then it is possible to update the >>>> table >>>> >>>>>>>>>> schema-id to > N where field A now has type long and this >>>> new schema >>>> >>>>>>>>>> can read parquet files with the old type. >>>> >>>>>>>>>> >>>> >>>>>>>>>> However, is it allowed to commit *new* parquet files with >>>> the old >>>> >>>>>>>>>> types (int) and commit them to the table with a table schema >>>> where >>>> >>>>>>>>>> types are promoted (long)? >>>> >>>>>>>>>> >>>> >>>>>>>>>> Also, is it allowed to commit parquet files, in general, >>>> which contain >>>> >>>>>>>>>> only a subset of columns of table schema? I.e. if I know a >>>> column is >>>> >>>>>>>>>> all NULLs, can we just skip writing it? >>>> >>>>>>>>>> >>>> >>>>>>>>>> Appreciate taking the time to look at this, >>>> >>>>>>>>>> Nic >>>> >>>>>>>>>> >>>> >>>>>>>>>> [spec-evo]: >>>> https://iceberg.apache.org/spec/#schema-evolution >>>> >>>>>>>>>> [^1]: This is for Redpanda to Iceberg native integration >>>> >>>>>>>>>> (https://github.com/redpanda-data/redpanda). >>>> >>>