Thanks for driving this Honah!

It's important to have a consistent naming scheme so that we don't need to
worry about edge cases when using multiple engines, and possibly have to
deal with migrations.

Also, since users can store arbitrary key/value pairs in the summary
property, it's good to document the currently used properties to avoid
collision.

I like the proposal to document all properties in a "snapshot summary"
table, this will ensure a centralized place to view all possible key/value
pairs, similar to how FileIO configuration is handled in iceberg-python
<https://py.iceberg.apache.org/configuration/#s3>. Other
implementations can use this table as a reference.

 > This approach offers flexibility, as new fields can be added through
documentation updates without requiring specification changes.
This will save a lot of effort since specification changes require
greater scrutiny.

> summary details would not be located near the Snapshot section, which
explains the summary field.
We can link the table to the Snapshot section.


Would love to hear others' thoughts on this.

Best,
Kevin Liu

On Tue, Nov 26, 2024 at 2:50 PM Honah J. <hon...@apache.org> wrote:

> Hi everyone,
>
> I’d like to propose an addition to the table specification to document
> optional fields in the snapshot summary.
>
> Currently, the snapshot summary includes a required operation field and
> various optional fields. While these optional fields—such as metrics and
> partition-level summaries—are supported by Java
> <https://github.com/apache/iceberg/blob/549674b3fc0cdb18d6cad3e2d6320236fba8c562/core/src/main/java/org/apache/iceberg/SnapshotSummary.java#L32-L64>
> and Python
> <https://github.com/HonahX/iceberg-python/blob/45d611fe351f6f3847bf329aa053d890d810e2b6/pyiceberg/table/snapshots.py#L36-L60>
> implementations, they are not officially documented. This creates risks of
> inconsistency as other implementations and engines adopt and interact with
> these fields.
>
> I propose adding a new section to the table specification to document
> these optional fields, ensuring consistent naming conventions and reducing
> ambiguity across implementations. While this is the primary proposal, it
> may also be worth discussing whether documenting these fields separately in
> Docs/Table would provide additional flexibility for future updates.
>
> I’d love to hear your thoughts, suggestions, or concerns about this
> proposal.
>
> Looking forward to the discussion!
>
> Links
>
>    - GitHub tracking issue: https://github.com/apache/iceberg/issues/11659
>    - Proposal:
>    
> https://docs.google.com/document/d/1Gt1ZOXVXK60IGdlmt4QlyRzaZ1iCVyYUBfMJCsiz14I/edit?usp=sharing
>    - PR: https://github.com/apache/iceberg/pull/11660
>
>
> Best regards,
> Honah
>

Reply via email to