[DISCUSS] Spec clarifications on reading/writing Identity partitioned columns

Micah Kornfield Thu, 25 Jul 2024 11:31:17 -0700

The Table specification doesn't mention anything about requirements for
whether writing identity partitioned columns is necessary.  Empirically, it
appears that implementations always write the column data at least for
parquet.  For columnar formats, this is relatively cheap as it is trivially
RLE encodable.  For Avro though it comes at a little bit of a higher cost.
Since the data is fully reproducible from Iceberg metadata, I think stating
this as optional in the specification would be useful.


For reading identity partitioned from Iceberg tables, I think the
specification needs to require that identity partition column values are
read from metadata.  This is due to the fact that Iceberg supports
migrating Hive data (and other table formats) without data rewrites that
don't typically write their partition information directly into files.

Thoughts?

When we get consensus I'll open up a PR to clarify these points.

Thanks,
Micah

[DISCUSS] Spec clarifications on reading/writing Identity partitioned columns

Reply via email to