Hi Nic,
This is IMO a gray area.

However, is it allowed to commit *new* parquet files with the old
> types (int) and commit them to the table with a table schema where
> types are promoted (long)?


IMO  I would expect writers to be writing files that are consistent with
the current metadata, so ideally they would not be written with int if it
is now long.  In general, though in these cases I think most readers are
robust to reading type promoted files.  We should probably clarify in the
specification.


Also, is it allowed to commit parquet files, in general, which contain
> only a subset of columns of table schema? I.e. if I know a column is
> all NULLs, can we just skip writing it?


As currently worded the spec on writing data files (
https://iceberg.apache.org/spec/#writing-data-files) should include all
columns. Based on column projection rules
<https://iceberg.apache.org/spec/#column-projection>, however, failing to
do so should also not cause problems.

Cheers,
Micah

On Fri, Aug 15, 2025 at 8:45 AM Nicolae Vartolomei
<n...@nvartolomei.com.invalid> wrote:

> Hi,
>
> I'm implementing an Iceberg writer[^1] and have a question about what
> type promotion actually means as part of schema evolution rules.
>
> Iceberg spec [specifies][spec-evo] which type promotions are allowed.
> No confusion there.
>
> The confusion on my end arises when it comes to actually writing i.e.
> parquet data. Let's take for example the int to long promotion. What
> is actually allowed under this promotion rule? Let me try to show what
> I mean.
>
> Obviously if I have a schema-id N with field A of type int and table
> snapshots with this schema then it is possible to update the table
> schema-id to > N where field A now has type long and this new schema
> can read parquet files with the old type.
>
> However, is it allowed to commit *new* parquet files with the old
> types (int) and commit them to the table with a table schema where
> types are promoted (long)?
>
> Also, is it allowed to commit parquet files, in general, which contain
> only a subset of columns of table schema? I.e. if I know a column is
> all NULLs, can we just skip writing it?
>
> Appreciate taking the time to look at this,
> Nic
>
> [spec-evo]: https://iceberg.apache.org/spec/#schema-evolution
> [^1]: This is for Redpanda to Iceberg native integration
> (https://github.com/redpanda-data/redpanda).
>

Reply via email to