Re: [VOTE] Deprecate or remove distinct_count

2025-02-11 Thread Xuanwo
Here is my +1 binding. The current status of `distinct_count` is quite confusing, which has also led to additional discussions in `iceberg-rust` about whether we need to add it and how to maintain it. Removing it seems reasonable to me, as there are no known use cases for `distinct_count` in a

Re: [Discussion] Spec change for Row Lineage - Allow Equality Deletes

2025-02-11 Thread Péter Váry
Hi Russell, Thanks for bringing this up! I think equality deletes are not the root of the problem here. - If we have a positional delete, and the new row doesn't include the old rowId, then the lineage info is lost. - If we have an equality delete, and the new row contains the rowId, then we have

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Gang Wu
+1 (non-binding) On Wed, Feb 12, 2025 at 6:17 AM Amogh Jahagirdar <2am...@gmail.com> wrote: > +1 thanks for driving this Gabor! > > On Wed, Feb 12, 2025 at 2:35 AM rdb...@gmail.com wrote: > >> +1 >> >> On Tue, Feb 11, 2025 at 10:50 AM Steve Zhang >> wrote: >> >>> +1 nb >>> >>> Thanks, >>> Steve

Re: [Discussion] Spec change for Row Lineage - Allow Equality Deletes

2025-02-11 Thread Gang Wu
Thanks Steven for the explanation! Yes, you're right that solely rewriting delete files does not help. IIUC, Iceberg is the only table format that does not produce changelog files. Is there any chance to recompute the row_id of updated rows by tracking changes of the identifier fields between snap

Re: [Discussion] Spec change for Row Lineage - Allow Equality Deletes

2025-02-11 Thread Steven Wu
I am fine with the proposed spec change. While it "supports/allows" equality deletes, row lineage semantics needn't/can't be maintained properly for equality deletes (compared to position deletes). Gang pointed out a couple issues with the implications. But we have no choice but to live with those

Re: [Discussion] Spec change for Row Lineage - Allow Equality Deletes

2025-02-11 Thread Gang Wu
Hi Russell, Thanks for supporting equality deletes to row lineage! > accept that "updates" will be treated as "delete" and "insert" I would say that it has obvious drawbacks below (though it is better than not supported): 1) updates will be populated differently when outputting changelogs to use

Re: [VOTE] Release Apache Iceberg 1.8.0 RC0

2025-02-11 Thread Jean-Baptiste Onofré
+1 (non binding) I checked: - hash and checksum are good - all LICENSE and NOTICE are good (including the last fixes we did) - no binary file in the source distribution - ASF header is present in all expected files - I was able to build from the source distribution - I did quick tests with spark a

Re: [VOTE] Simplify multi-arg table metadata

2025-02-11 Thread Wing Yew Poon
+1 (non-binding) On Mon, Feb 10, 2025 at 10:26 AM Yufei Gu wrote: > +1 > Yufei > > > On Mon, Feb 10, 2025 at 9:48 AM Steve Zhang > wrote: > >> +1 (non-binding). >> >> Thanks, >> Steve Zhang >> >> >> >> On Feb 9, 2025, at 1:01 AM, Fokko Driesprong wrote: >> >> (Second attempt, the cat

[Announce] Apache Iceberg Community Meetup in SF and Seattle

2025-02-11 Thread Kevin Liu
Hey folks, We have more Iceberg community meetups coming up on *February 27* in both *San Francisco* and *Seattle*! Here are the registration links: * San Francisco: https://lu.ma/77zbx044 * Seattle: https://lu.ma/44yd7yo5 For the Seattle event, we're also looking for speakers! If you're interest

[Discussion] Spec change for Row Lineage - Allow Equality Deletes

2025-02-11 Thread Russell Spitzer
Hi Y'all, As we have been working on the row lineage implementation I've been reached out to by a few folks in the community who are interested in changing our defined behavior around equality deletes. Currently when Row Lineage is enabled, the spec says to disable equality deletes for the table

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Amogh Jahagirdar
+1 thanks for driving this Gabor! On Wed, Feb 12, 2025 at 2:35 AM rdb...@gmail.com wrote: > +1 > > On Tue, Feb 11, 2025 at 10:50 AM Steve Zhang > wrote: > >> +1 nb >> >> Thanks, >> Steve Zhang >> >> >> >> On Feb 11, 2025, at 10:26 AM, Honah J. wrote: >> >> +1 >> >> On Tue, Feb 11, 2025 at 10:1

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread rdb...@gmail.com
+1 On Tue, Feb 11, 2025 at 10:50 AM Steve Zhang wrote: > +1 nb > > Thanks, > Steve Zhang > > > > On Feb 11, 2025, at 10:26 AM, Honah J. wrote: > > +1 > > On Tue, Feb 11, 2025 at 10:16 AM Christian Thiel < > christian.t.b...@gmail.com> wrote: > >> +1 (non-binding) >> Thanks Gabor! >> >> On Tue,

Re: [VOTE] Release Apache Iceberg 1.8.0 RC0

2025-02-11 Thread rdb...@gmail.com
+1 * Validated signature and checksum * Ran RAT checks * Ran tests that didn't require Docker in Java 17 As a follow up, I think that we should move any tests that require Docker to integrationTest rather than test. We should try not to rely on Docker containers in normal unit tests because conta

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Steve Zhang
+1 nb Thanks, Steve Zhang > On Feb 11, 2025, at 10:26 AM, Honah J. wrote: > > +1 > > On Tue, Feb 11, 2025 at 10:16 AM Christian Thiel > wrote: >> +1 (non-binding) >> Thanks Gabor! >> >> On Tue, 11 Feb 2025 at 18:30, Yufei Gu >

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Honah J.
+1 On Tue, Feb 11, 2025 at 10:16 AM Christian Thiel wrote: > +1 (non-binding) > Thanks Gabor! > > On Tue, 11 Feb 2025 at 18:30, Yufei Gu wrote: > >> +1 >> Yufei >> >> >> On Tue, Feb 11, 2025 at 8:57 AM Steven Wu wrote: >> >>> +1 >>> >>> On Tue, Feb 11, 2025 at 8:55 AM Russell Spitzer < >>> rus

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Christian Thiel
+1 (non-binding) Thanks Gabor! On Tue, 11 Feb 2025 at 18:30, Yufei Gu wrote: > +1 > Yufei > > > On Tue, Feb 11, 2025 at 8:57 AM Steven Wu wrote: > >> +1 >> >> On Tue, Feb 11, 2025 at 8:55 AM Russell Spitzer < >> russell.spit...@gmail.com> wrote: >> >>> +1 >>> >>> On Tue, Feb 11, 2025 at 9:15 AM

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Yufei Gu
+1 Yufei On Tue, Feb 11, 2025 at 8:57 AM Steven Wu wrote: > +1 > > On Tue, Feb 11, 2025 at 8:55 AM Russell Spitzer > wrote: > >> +1 >> >> On Tue, Feb 11, 2025 at 9:15 AM Fokko Driesprong >> wrote: >> >>> +1 >>> >>> Op di 11 feb 2025 om 13:52 schreef Jean-Baptiste Onofré >> >: >>> +1 (non

Re: [VOTE] Release Apache Iceberg 1.8.0 RC0

2025-02-11 Thread Kevin Liu
+1 (non binding) Checked signature, checksum, license, and tests. Had a few flaky tests running on M1 Mac, listed below. I reran the tests on ubuntu using github runners and it completed successfully. I also t

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Steven Wu
+1 On Tue, Feb 11, 2025 at 8:55 AM Russell Spitzer wrote: > +1 > > On Tue, Feb 11, 2025 at 9:15 AM Fokko Driesprong wrote: > >> +1 >> >> Op di 11 feb 2025 om 13:52 schreef Jean-Baptiste Onofré > >: >> >>> +1 (non binding) >>> >>> Regards >>> JB >>> >>> On Tue, Feb 11, 2025 at 3:38 AM Gabor Kasz

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Russell Spitzer
+1 On Tue, Feb 11, 2025 at 9:15 AM Fokko Driesprong wrote: > +1 > > Op di 11 feb 2025 om 13:52 schreef Jean-Baptiste Onofré : > >> +1 (non binding) >> >> Regards >> JB >> >> On Tue, Feb 11, 2025 at 3:38 AM Gabor Kaszab >> wrote: >> > >> > Hi Iceberg Community, >> > >> > I'm working on removing

[DISCUSS] FileFormat API proposal

2025-02-11 Thread Péter Váry
Hi Team, As mentioned earlier on our Community Sync I am exploring the possibility to define a FileFormat API for accessing different file formats. I have put together a proposal based on my findings. --- Iceberg currently supports 3 different file formats: Avro, Parquet, ORC. Wit

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Fokko Driesprong
+1 Op di 11 feb 2025 om 13:52 schreef Jean-Baptiste Onofré : > +1 (non binding) > > Regards > JB > > On Tue, Feb 11, 2025 at 3:38 AM Gabor Kaszab > wrote: > > > > Hi Iceberg Community, > > > > I'm working on removing the unused schemas from the table metadata when > running snapshot expiration.

Re: [VOTE] Deprecate or remove distinct_count

2025-02-11 Thread Fokko Driesprong
My mistake, I suggested sending out an email with a quick vote on the PR. I like the suggestion to use this thread for discussion since the number of options is limited. I'm in favor of deprecating the field, to avoid that we re-use the field-id in the future. Kind regards, Fokko Op di 11 feb 20

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Jean-Baptiste Onofré
+1 (non binding) Regards JB On Tue, Feb 11, 2025 at 3:38 AM Gabor Kaszab wrote: > > Hi Iceberg Community, > > I'm working on removing the unused schemas from the table metadata when > running snapshot expiration. One part of this work is a change in REST spec > to add a new update type for rem

Re: [VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Eduard Tudenhöfner
+1 On Tue, Feb 11, 2025 at 10:40 AM Gabor Kaszab wrote: > Hi Iceberg Community, > > I'm working on removing the unused schemas from the table metadata when > running snapshot expiration. One part of this work is a change in REST spec > to add a new update type for removing schemas. > > I'd like

[VOTE] Add RemoveSchemas update type to REST spec

2025-02-11 Thread Gabor Kaszab
Hi Iceberg Community, I'm working on removing the unused schemas from the table metadata when running snapshot expiration. One part of this work is a change in REST spec to add a new update type for removing schemas. I'd like to start a vote on this REST spec change: https://github.com/apache/ice