I think especially with support for default values, it's important for the
writer to always produce the column. Otherwise things that would arguably
be safe are confusing and potentially return incorrect values. For
instance, moving a file from a spec with an identity partition to one that
drops th
I might have missed it but in skimming I couldn't find a section in the
spec about writing all columns to the data file.
I posted https://github.com/apache/iceberg/pull/10835 which says
implementations should write the column for redundancy but leaves the
option open for others.
Thanks,
Micah
O
I support clarifying how to handle this when reading. It's definitely a
best practice to project the values from metadata because the columns may
not exist in data files when the files were converted from Hive.
For writes, I _thought_ that the spec requires writing the values into data
files so th
I have no problem with explicitly stating that writing identity source
columns is optional on write. We should, of course, mandate surfacing the
column on read :)
On Thu, Jul 25, 2024 at 1:30 PM Micah Kornfield
wrote:
> The Table specification doesn't mention anything about requirements for
> wh
The Table specification doesn't mention anything about requirements for
whether writing identity partitioned columns is necessary. Empirically, it
appears that implementations always write the column data at least for
parquet. For columnar formats, this is relatively cheap as it is trivially
RLE