Hi Nic, This is IMO a gray area. However, is it allowed to commit *new* parquet files with the old > types (int) and commit them to the table with a table schema where > types are promoted (long)?
IMO I would expect writers to be writing files that are consistent with the current metadata, so ideally they would not be written with int if it is now long. In general, though in these cases I think most readers are robust to reading type promoted files. We should probably clarify in the specification. Also, is it allowed to commit parquet files, in general, which contain > only a subset of columns of table schema? I.e. if I know a column is > all NULLs, can we just skip writing it? As currently worded the spec on writing data files ( https://iceberg.apache.org/spec/#writing-data-files) should include all columns. Based on column projection rules <https://iceberg.apache.org/spec/#column-projection>, however, failing to do so should also not cause problems. Cheers, Micah On Fri, Aug 15, 2025 at 8:45 AM Nicolae Vartolomei <n...@nvartolomei.com.invalid> wrote: > Hi, > > I'm implementing an Iceberg writer[^1] and have a question about what > type promotion actually means as part of schema evolution rules. > > Iceberg spec [specifies][spec-evo] which type promotions are allowed. > No confusion there. > > The confusion on my end arises when it comes to actually writing i.e. > parquet data. Let's take for example the int to long promotion. What > is actually allowed under this promotion rule? Let me try to show what > I mean. > > Obviously if I have a schema-id N with field A of type int and table > snapshots with this schema then it is possible to update the table > schema-id to > N where field A now has type long and this new schema > can read parquet files with the old type. > > However, is it allowed to commit *new* parquet files with the old > types (int) and commit them to the table with a table schema where > types are promoted (long)? > > Also, is it allowed to commit parquet files, in general, which contain > only a subset of columns of table schema? I.e. if I know a column is > all NULLs, can we just skip writing it? > > Appreciate taking the time to look at this, > Nic > > [spec-evo]: https://iceberg.apache.org/spec/#schema-evolution > [^1]: This is for Redpanda to Iceberg native integration > (https://github.com/redpanda-data/redpanda). >