Re: [DISCUSS] Support keeping at most N snapshots
The intention is to set an upper limit for table size while keeping as much snapshots as possible. Setting max-snapshot-age-ms to a small value will lose history for some tables while setting min-snapshots-to-keep to a medium value will keep too much history for others. Lewis, William 于2025年1月18日 周六03:50写道: > To clarify, the intention is that if max-snapshots-to-keep is set, then > snapshots will be expired even if they are younger than > max-snapshot-age-ms? Does this solve a problem that isn’t solved by setting > max-snapshot-age-ms to a small value and setting min-snapshots-to-keep to > your desired value? > > >
Re: [DISCUSS] Support keeping at most N snapshots
To clarify, the intention is that if max-snapshots-to-keep is set, then snapshots will be expired even if they are younger than max-snapshot-age-ms? Does this solve a problem that isn’t solved by setting max-snapshot-age-ms to a small value and setting min-snapshots-to-keep to your desired value?
Re: [Discuss][Vote] Spec Change - Add optional field added-rows to Snapshot for Row Lineage
+0, as I agree with Amogh, I think it would fit nicely with Honah's work of formalizing the properties. Kind regards, Fokko Op vr 17 jan 2025 om 08:55 schreef Honah J. : > +1 > > Best, > Honah > > On Thu, Jan 16, 2025 at 22:54 Manish Malhotra < > manish.malhotra.w...@gmail.com> wrote: > >> +1, thanks Russel! >> this will help other engines as well. >> >> >> Thanks, >> Manish >> >> On Thu, Jan 16, 2025 at 3:15 PM Amogh Jahagirdar <2am...@gmail.com> >> wrote: >> >>> I'm +0. I definitely agree with the premise that we need a spec change >>> to ensure added rows exist at the snapshot level for row lineage, but I >>> feel like there is an advantage to just formalizing the added-records >>> snapshot summary property, and make it required for writers in case row >>> lineage is enabled on the table. The advantage is that in the ecosystem >>> more implementations are likely to populate the summary already (beyond the >>> Java implementation, I see Python does as well) so for those >>> implementations, the lift to support row lineage is a little bit reduced >>> since the field will probably already be populated. It also avoids any >>> awkwardness around having 2 of essentially the same field in metadata. >>> >>> In the end, I think that is a minor advantage so I'm not very >>> opinionated on this. We're talking about one field, and the additional lift >>> for that is some slightly additional parsing handling in implementations >>> which in the grand scheme of things is a smaller portion of the work >>> involved. I also understand the argument that it's awkward to have required >>> fields be in the summary in the first place (thinking back to our >>> discussions around operation handling). >>> >>> Thanks, >>> >>> Amogh Jahagirdar >>> >>> On Thu, Jan 16, 2025 at 2:52 PM Prashant Singh >>> wrote: >>> +1 (non-binding) ! Best, Prashant Singh On Thu, Jan 16, 2025 at 1:14 PM Daniel Weeks wrote: > +1 > > On Thu, Jan 16, 2025 at 10:39 AM Steve Zhang > wrote: > >> Thank you Russell! +1 (non-binding) >> >> Thanks, >> Steve Zhang >> >> >> >> On Jan 15, 2025, at 10:53 PM, huaxin gao >> wrote: >> >> +1 (non-binding) >> >> >>
Re: [Discuss][Vote] Spec Change - Add optional field added-rows to Snapshot for Row Lineage
Thanks everyone, especially Amogh and Fokko for providing other views on this change. I think your viewpoints are definitely valid although I think currently we should be working on moving things away from Snapshot Summary if at all possible. I think it's better if the information there is never critical for the correct functioning of an Iceberg implementation and I think in the future it would probably be better if we got operation out of there as well. That said I think we have consensus and I'll merge this change after the last review comments. On Fri, Jan 17, 2025 at 4:22 AM Jean-Baptiste Onofré wrote: > +1 (non binding) > > Regards > JB > > On Wed, Jan 15, 2025 at 5:59 PM Russell Spitzer > wrote: > > > > Hi Everyone! > > > > PR: https://github.com/apache/iceberg/pull/11976/files > > > > Split out from #11948 > > > > Working on the row-lineage implementation made it clear that we needed a > way to get information from the Snapshot object propagated into the > Metadata layer. Specifically we need to know the count of all newly added > rows in that Snapshot to change the last-row-id of the TableMetadata. While > we can potentially read this from Snapshot Summary, it would be a bit odd > to have a requirement on key value pair within Snapshot summary. We could > also potentially re-read and calculate the number of added rows, but this > would require re-opening the manifest list itself. > > > > I believe it makes more sense to formally have added-rows as an optional > field within the Snapshot itself so we can make it clear in the spec this > value is expected to be stored within the Snapshot metadata if row-lineage > is enabled. > > > > > > Please take a look at the PR and signal vote or not you approve of > adding this additional field to Snapshot >
Re: [Discuss][Vote] Spec Change - Add optional field added-rows to Snapshot for Row Lineage
+1 (non binding) Regards JB On Wed, Jan 15, 2025 at 5:59 PM Russell Spitzer wrote: > > Hi Everyone! > > PR: https://github.com/apache/iceberg/pull/11976/files > > Split out from #11948 > > Working on the row-lineage implementation made it clear that we needed a way > to get information from the Snapshot object propagated into the Metadata > layer. Specifically we need to know the count of all newly added rows in that > Snapshot to change the last-row-id of the TableMetadata. While we can > potentially read this from Snapshot Summary, it would be a bit odd to have a > requirement on key value pair within Snapshot summary. We could also > potentially re-read and calculate the number of added rows, but this would > require re-opening the manifest list itself. > > I believe it makes more sense to formally have added-rows as an optional > field within the Snapshot itself so we can make it clear in the spec this > value is expected to be stored within the Snapshot metadata if row-lineage is > enabled. > > > Please take a look at the PR and signal vote or not you approve of adding > this additional field to Snapshot