This makes sense to me generally, I've tried a few times to search in the
spec to find a list of possible snapshot summary properties, and was a bit
surprised to not find them there.  So I think this would be a nice addition.

I'm curious if there's any historical reason it's not been included in the
spec.

Thanks
Szehon

On Wed, Nov 27, 2024 at 10:55 AM Kevin Liu <kevinjq...@apache.org> wrote:

> Thanks for driving this Honah!
>
> It's important to have a consistent naming scheme so that we don't need to
> worry about edge cases when using multiple engines, and possibly have to
> deal with migrations.
>
> Also, since users can store arbitrary key/value pairs in the summary
> property, it's good to document the currently used properties to avoid
> collision.
>
> I like the proposal to document all properties in a "snapshot summary"
> table, this will ensure a centralized place to view all possible key/value
> pairs, similar to how FileIO configuration is handled in iceberg-python
> <https://py.iceberg.apache.org/configuration/#s3>. Other
> implementations can use this table as a reference.
>
>  > This approach offers flexibility, as new fields can be added through
> documentation updates without requiring specification changes.
> This will save a lot of effort since specification changes require
> greater scrutiny.
>
> > summary details would not be located near the Snapshot section, which
> explains the summary field.
> We can link the table to the Snapshot section.
>
>
> Would love to hear others' thoughts on this.
>
> Best,
> Kevin Liu
>
> On Tue, Nov 26, 2024 at 2:50 PM Honah J. <hon...@apache.org> wrote:
>
>> Hi everyone,
>>
>> I’d like to propose an addition to the table specification to document
>> optional fields in the snapshot summary.
>>
>> Currently, the snapshot summary includes a required operation field and
>> various optional fields. While these optional fields—such as metrics and
>> partition-level summaries—are supported by Java
>> <https://github.com/apache/iceberg/blob/549674b3fc0cdb18d6cad3e2d6320236fba8c562/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L32-L64>
>> and Python
>> <https://github.com/HonahX/iceberg-python/blob/45d611fe351f6f3847bf329aa053d890d810e2b6/pyiceberg/table/snapshots.py#L36-L60>
>> implementations, they are not officially documented. This creates risks of
>> inconsistency as other implementations and engines adopt and interact with
>> these fields.
>>
>> I propose adding a new section to the table specification to document
>> these optional fields, ensuring consistent naming conventions and reducing
>> ambiguity across implementations. While this is the primary proposal, it
>> may also be worth discussing whether documenting these fields separately in
>> Docs/Table would provide additional flexibility for future updates.
>>
>> I’d love to hear your thoughts, suggestions, or concerns about this
>> proposal.
>>
>> Looking forward to the discussion!
>>
>> Links
>>
>>    - GitHub tracking issue:
>>    https://github.com/apache/iceberg/issues/11659
>>    - Proposal:
>>    
>> https://docs.google.com/document/d/1Gt1ZOXVXK60IGdlmt4QlyRzaZ1iCVyYUBfMJCsiz14I/edit?usp=sharing
>>    - PR: https://github.com/apache/iceberg/pull/11660
>>
>>
>> Best regards,
>> Honah
>>
>

Reply via email to