Thanks everyone, especially Amogh and Fokko for providing other views on this change. I think your viewpoints are definitely valid although I think currently we should be working on moving things away from Snapshot Summary if at all possible. I think it's better if the information there is never critical for the correct functioning of an Iceberg implementation and I think in the future it would probably be better if we got operation out of there as well.
That said I think we have consensus and I'll merge this change after the last review comments. On Fri, Jan 17, 2025 at 4:22 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > +1 (non binding) > > Regards > JB > > On Wed, Jan 15, 2025 at 5:59 PM Russell Spitzer > <russell.spit...@gmail.com> wrote: > > > > Hi Everyone! > > > > PR: https://github.com/apache/iceberg/pull/11976/files > > > > Split out from #11948 > > > > Working on the row-lineage implementation made it clear that we needed a > way to get information from the Snapshot object propagated into the > Metadata layer. Specifically we need to know the count of all newly added > rows in that Snapshot to change the last-row-id of the TableMetadata. While > we can potentially read this from Snapshot Summary, it would be a bit odd > to have a requirement on key value pair within Snapshot summary. We could > also potentially re-read and calculate the number of added rows, but this > would require re-opening the manifest list itself. > > > > I believe it makes more sense to formally have added-rows as an optional > field within the Snapshot itself so we can make it clear in the spec this > value is expected to be stored within the Snapshot metadata if row-lineage > is enabled. > > > > > > Please take a look at the PR and signal vote or not you approve of > adding this additional field to Snapshot >