This makes sense to me generally, I've tried a few times to search in the spec to find a list of possible snapshot summary properties, and was a bit surprised to not find them there. So I think this would be a nice addition.
I'm curious if there's any historical reason it's not been included in the spec. Thanks Szehon On Wed, Nov 27, 2024 at 10:55 AM Kevin Liu <kevinjq...@apache.org> wrote: > Thanks for driving this Honah! > > It's important to have a consistent naming scheme so that we don't need to > worry about edge cases when using multiple engines, and possibly have to > deal with migrations. > > Also, since users can store arbitrary key/value pairs in the summary > property, it's good to document the currently used properties to avoid > collision. > > I like the proposal to document all properties in a "snapshot summary" > table, this will ensure a centralized place to view all possible key/value > pairs, similar to how FileIO configuration is handled in iceberg-python > <https://py.iceberg.apache.org/configuration/#s3>. Other > implementations can use this table as a reference. > > > This approach offers flexibility, as new fields can be added through > documentation updates without requiring specification changes. > This will save a lot of effort since specification changes require > greater scrutiny. > > > summary details would not be located near the Snapshot section, which > explains the summary field. > We can link the table to the Snapshot section. > > > Would love to hear others' thoughts on this. > > Best, > Kevin Liu > > On Tue, Nov 26, 2024 at 2:50 PM Honah J. <hon...@apache.org> wrote: > >> Hi everyone, >> >> I’d like to propose an addition to the table specification to document >> optional fields in the snapshot summary. >> >> Currently, the snapshot summary includes a required operation field and >> various optional fields. While these optional fields—such as metrics and >> partition-level summaries—are supported by Java >> <https://github.com/apache/iceberg/blob/549674b3fc0cdb18d6cad3e2d6320236fba8c562/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L32-L64> >> and Python >> <https://github.com/HonahX/iceberg-python/blob/45d611fe351f6f3847bf329aa053d890d810e2b6/pyiceberg/table/snapshots.py#L36-L60> >> implementations, they are not officially documented. This creates risks of >> inconsistency as other implementations and engines adopt and interact with >> these fields. >> >> I propose adding a new section to the table specification to document >> these optional fields, ensuring consistent naming conventions and reducing >> ambiguity across implementations. While this is the primary proposal, it >> may also be worth discussing whether documenting these fields separately in >> Docs/Table would provide additional flexibility for future updates. >> >> I’d love to hear your thoughts, suggestions, or concerns about this >> proposal. >> >> Looking forward to the discussion! >> >> Links >> >> - GitHub tracking issue: >> https://github.com/apache/iceberg/issues/11659 >> - Proposal: >> >> https://docs.google.com/document/d/1Gt1ZOXVXK60IGdlmt4QlyRzaZ1iCVyYUBfMJCsiz14I/edit?usp=sharing >> - PR: https://github.com/apache/iceberg/pull/11660 >> >> >> Best regards, >> Honah >> >