Re: Question on record_count field in the data-file entry of a manifest file

2021-04-07 Thread Russell Spitzer
I don't think anything actually uses Record counts at the moment, but if you include them they should be correct. In general we allow any metric to also be empty which is treated as "unknown". This looks like what we currently do with Avro When we import Avro files in spark we skip doing any fi

Re: Question on record_count field in the data-file entry of a manifest file

2021-04-07 Thread Vivekanand Vellanki
I understand the part about the file sizes. The file size information can be used to read the Parquet/ORC footers assuming the file size in the manifest files. My question is specific to record counts in these files. Are these expected to be accurate as well? On Wed, Apr 7, 2021 at 5:53 PM wrote

Re: Question on record_count field in the data-file entry of a manifest file

2021-04-07 Thread russell . spitzer
Iceberg stores this information and other footer and file level details in manifests for just such a use case. The goal is always to read the files once and then save metrics and statistics in the manifest so they do not need be read again. If the value is not accurate there is a bug in Iceber