The Table specification doesn't mention anything about requirements for
whether writing identity partitioned columns is necessary.  Empirically, it
appears that implementations always write the column data at least for
parquet.  For columnar formats, this is relatively cheap as it is trivially
RLE encodable.  For Avro though it comes at a little bit of a higher cost.
Since the data is fully reproducible from Iceberg metadata, I think stating
this as optional in the specification would be useful.

For reading identity partitioned from Iceberg tables, I think the
specification needs to require that identity partition column values are
read from metadata.  This is due to the fact that Iceberg supports
migrating Hive data (and other table formats) without data rewrites that
don't typically write their partition information directly into files.

Thoughts?

When we get consensus I'll open up a PR to clarify these points.

Thanks,
Micah

Reply via email to