Hi everyone, Happy new year! I've updated the proposal and PR with the optional snapshot summary fields documented in a new Appendix in table spec and addressed review comments. You can find the links below:
- proposal doc <https://docs.google.com/document/d/1Gt1ZOXVXK60IGdlmt4QlyRzaZ1iCVyYUBfMJCsiz14I/edit?usp=sharing> - PR #11660 <https://github.com/apache/iceberg/pull/11660> Please take a moment to review them when you have the chance, and feel free to share any thoughts or questions you may have. Best regards, Honah On Tue, Dec 17, 2024 at 10:02 PM Honah J. <hon...@apache.org> wrote: > Thank you all for the feedback! > > It appears we have reached a consensus on documenting the snapshot summary > fields. Additionally, there is a preference to document these fields > outside the main body of the spec and make sure they are not tied to the > spec version. > > Two options have been suggested: > > 1. Documenting them on a new page at the same level as table > configuration. > 2. Including them in an appendix within the spec. > > Option 1 offers greater flexibility for future additions and > modifications. However, snapshot summary fields might be too low-level to > include alongside user-facing topics like Configuration, Schemas, and > Partitioning. Moreover, referencing versioned documentation within the spec > might not be feasible. > > Option 2 provides a more balanced approach, separating these details from > the main spec while keeping them within the same document. > > I will update the proposal and PR to adopt option 2, moving these fields > to an appendix in the spec. > > Thank you again for your valuable feedback! > > Best regards, > Honah > > On Mon, Dec 16, 2024 at 5:26 AM Fokko Driesprong <fo...@apache.org> wrote: > >> I'm in favor of this as well. While working on PyIceberg I had to deduce >> this from the Java code, having a more condensed version in the appendix of >> the spec would be great. >> >> Kind regards, >> Fokko >> >> Op ma 16 dec 2024 om 14:21 schreef Jean-Baptiste Onofré <j...@nanthrax.net >> >: >> >>> Hi, >>> >>> yes I agree, I don't think we have to couple of spec version. >>> >>> Regards >>> JB >>> >>> On Wed, Dec 11, 2024 at 11:17 PM Russell Spitzer >>> <russell.spit...@gmail.com> wrote: >>> > >>> > I want to float this back up, I think this is a really good idea for >>> cross engine support. I don't think we have to tie this to any specific >>> Spec version since they are just recommendations so I think we can do this >>> at any time >>> > >>> > On Wed, Nov 27, 2024 at 1:31 PM Szehon Ho <szehon.apa...@gmail.com> >>> wrote: >>> >> >>> >> This makes sense to me generally, I've tried a few times to search in >>> the spec to find a list of possible snapshot summary properties, and was a >>> bit surprised to not find them there. So I think this would be a nice >>> addition. >>> >> >>> >> I'm curious if there's any historical reason it's not been included >>> in the spec. >>> >> >>> >> Thanks >>> >> Szehon >>> >> >>> >> On Wed, Nov 27, 2024 at 10:55 AM Kevin Liu <kevinjq...@apache.org> >>> wrote: >>> >>> >>> >>> Thanks for driving this Honah! >>> >>> >>> >>> It's important to have a consistent naming scheme so that we don't >>> need to worry about edge cases when using multiple engines, and possibly >>> have to deal with migrations. >>> >>> >>> >>> Also, since users can store arbitrary key/value pairs in the summary >>> property, it's good to document the currently used properties to avoid >>> collision. >>> >>> >>> >>> I like the proposal to document all properties in a "snapshot >>> summary" table, this will ensure a centralized place to view all possible >>> key/value pairs, similar to how FileIO configuration is handled in >>> iceberg-python. Other implementations can use this table as a reference. >>> >>> >>> >>> > This approach offers flexibility, as new fields can be added >>> through documentation updates without requiring specification changes. >>> >>> This will save a lot of effort since specification changes require >>> greater scrutiny. >>> >>> >>> >>> > summary details would not be located near the Snapshot section, >>> which explains the summary field. >>> >>> We can link the table to the Snapshot section. >>> >>> >>> >>> >>> >>> Would love to hear others' thoughts on this. >>> >>> >>> >>> Best, >>> >>> Kevin Liu >>> >>> >>> >>> On Tue, Nov 26, 2024 at 2:50 PM Honah J. <hon...@apache.org> wrote: >>> >>>> >>> >>>> Hi everyone, >>> >>>> >>> >>>> I’d like to propose an addition to the table specification to >>> document optional fields in the snapshot summary. >>> >>>> >>> >>>> Currently, the snapshot summary includes a required operation field >>> and various optional fields. While these optional fields—such as metrics >>> and partition-level summaries—are supported by Java and Python >>> implementations, they are not officially documented. This creates risks of >>> inconsistency as other implementations and engines adopt and interact with >>> these fields. >>> >>>> >>> >>>> I propose adding a new section to the table specification to >>> document these optional fields, ensuring consistent naming conventions and >>> reducing ambiguity across implementations. While this is the primary >>> proposal, it may also be worth discussing whether documenting these fields >>> separately in Docs/Table would provide additional flexibility for future >>> updates. >>> >>>> >>> >>>> I’d love to hear your thoughts, suggestions, or concerns about this >>> proposal. >>> >>>> >>> >>>> Looking forward to the discussion! >>> >>>> >>> >>>> Links >>> >>>> >>> >>>> GitHub tracking issue: >>> https://github.com/apache/iceberg/issues/11659 >>> >>>> Proposal: >>> https://docs.google.com/document/d/1Gt1ZOXVXK60IGdlmt4QlyRzaZ1iCVyYUBfMJCsiz14I/edit?usp=sharing >>> >>>> PR: https://github.com/apache/iceberg/pull/11660 >>> >>>> >>> >>>> >>> >>>> Best regards, >>> >>>> Honah >>> >>