I'm coming at this from a mental model where a producer(s) to a given Table
is tightly-coupled to a specific Schema. That is, even as the Table's
Schema is evolved, the producer's logic will be unchanged - they produce
parquet files that have the same parquet metadata and columns. (This model
may p
We've considered this in the past and I'm undecided on it. There is some
benefit, like being able to prune files during planning if the file didn't
contain a column that is used in a non-null filter (i.e. `new_data_column
IN ("a", "b")`).
On the other hand, we don't want data files that were writt
Thanks for the info, it is very helpful. I see it debugging down through
`org.apache.iceberg.ManifestReader#readMetadata`. It wasn't obvious to me
that this sort of data would be in the avro metadata as opposed to the
org.apache.iceberg.ManifestFile object. I may have some questions later
about the
Hi Devin,
The schema-id is stored in the Manifest Avro header:
https://iceberg.apache.org/spec/#manifests Also the schema itself is stored
there. Would that help your situation? I think this makes adding it to the
data file redundant.
Kind regards,
Fokko
Op vr 14 feb 2025 om 17:56 schreef Devin
I want to make sure I'm not missing something that already exists;
otherwise, hoping to get a quick thumbs up / thumbs down on a potential
proposal before spending more time on it.
It would be nice to know what Iceberg schema a writer used (/assumed) when
writing a DataFile. Oftentimes, this infor