Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-05-07 Thread Walaa Eldin Moustafa
Thanks Steven! So would you agree that resolution using default-catalog and default-namespace does not provide full determinism, and requires a supporting safety mechanism? Thanks, Walaa. On Wed, May 7, 2025 at 10:30 PM Steven Wu wrote: > > If the current model is considered deterministic, do y

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-05-07 Thread Steven Wu
> If the current model is considered deterministic, do you think `default-catalog` and `default-namespace` fields provide enough determinism to eliminate the need for UUIDs when storing table identifiers? I am fine with storing UUIDs for table identifiers in the view. Basically, view creation reso

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-05-07 Thread Walaa Eldin Moustafa
Hi Steven, Thanks for the reply. > I agree with Dan that we shouldn't solve catalog naming in the Iceberg view spec. To clarify, I don't believe the proposal is trying to solve catalog naming. What it’s doing is simply this: * Proposing that table names inside views resolve the same way as they

Re: [DISCUSS] Table Identifiers in Iceberg View Spec

2025-05-07 Thread Steven Wu
I agree with Dan that we shouldn't solve catalog naming in the Iceberg view spec. I am not convinced that the proposed change will make the table identifier resolution more clear and portable. The recommendation of using engines' current catalog and database can cause context dependent resolution r

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-07 Thread Steven Wu
Xiaoxuan, it is unclear to me what exactly we are trying to achieve here. It started with equality vs position deletes. But the proposal mentioned inverted indexes for every column. Note that equality deletes have equality fields (similar to primary key) concept. if we are only talking about row-le

Re: [DISCUSS] Finalizing the v3 spec

2025-05-07 Thread Anton Okolnychyi
Steven, that may be a good point to add to ensure the metadata is properly maintained. If I remember correctly, the Spark implementation already drops old DVs in DELETE/UPDATE/MERGE but the data compaction wasn't doing it originally. I wonder if we fixed it. Eduard may know more. - Anton ср, 7 тр

Re: [DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-07 Thread Zheng Hu
Hi Xiaoxuan Thanks to the proposal, the equality delete was designed initially for the fast upserts, such as the upstream cdc stream can be streamed into the iceberg directly, with relatively good freshness. I agreed that if we talked about the best performance then it is partially implemented,

Re: [DISCUSS] Finalizing the v3 spec

2025-05-07 Thread Steven Wu
For the delete vection change, should we add the following constraint/requirement for the write path in the spec? I don't know if this is already the behavior of the Spark implementation. "if a data file is removed from the table, the corresponding DV reference must also be removed from delete man

[DISCUSS] Pre-Proposal: Improving Merge-On-Read Query Performance With Indexing

2025-05-07 Thread Xiaoxuan Li
Hi team, We've been exploring ways to optimize and balance read and write performance in merge-on-read scenarios for a while. Below are our early ideas, and we’d appreciate community feedback to help validate them against the Iceberg spec, especially any edge cases we might have missed. We’re als

Re: [VOTE] Minor clarification for Geo Spec

2025-05-07 Thread huaxin gao
+1 (non-binding) On Wed, May 7, 2025 at 9:29 AM Denny Lee wrote: > +1 (non-binding) > > On Wed, May 7, 2025 at 8:37 AM Daniel Weeks wrote: > >> +1 (binding) >> >> On Wed, May 7, 2025 at 7:24 AM Russell Spitzer >> wrote: >> >>> +1 (bind) >>> >>> On Wed, May 7, 2025 at 7:32 AM Eduard Tudenhöfner

Re: [VOTE] Minor clarification for Geo Spec

2025-05-07 Thread Denny Lee
+1 (non-binding) On Wed, May 7, 2025 at 8:37 AM Daniel Weeks wrote: > +1 (binding) > > On Wed, May 7, 2025 at 7:24 AM Russell Spitzer > wrote: > >> +1 (bind) >> >> On Wed, May 7, 2025 at 7:32 AM Eduard Tudenhöfner < >> etudenhoef...@apache.org> wrote: >> >>> +1 (binding) >>> >>> On Wed, May 7,

Re: [VOTE] Minor clarification for Geo Spec

2025-05-07 Thread Daniel Weeks
+1 (binding) On Wed, May 7, 2025 at 7:24 AM Russell Spitzer wrote: > +1 (bind) > > On Wed, May 7, 2025 at 7:32 AM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> +1 (binding) >> >> On Wed, May 7, 2025 at 4:14 AM Gang Wu wrote: >> >>> The clarification is simple and clear from the w

Re: [VOTE] Minor clarification for Geo Spec

2025-05-07 Thread Fokko Driesprong
+1 (b) Op wo 7 mei 2025 om 16:24 schreef Russell Spitzer : > +1 (bind) > > On Wed, May 7, 2025 at 7:32 AM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> +1 (binding) >> >> On Wed, May 7, 2025 at 4:14 AM Gang Wu wrote: >> >>> The clarification is simple and clear from the writer's p

Re: [VOTE] Minor clarification for Geo Spec

2025-05-07 Thread Russell Spitzer
+1 (bind) On Wed, May 7, 2025 at 7:32 AM Eduard Tudenhöfner wrote: > +1 (binding) > > On Wed, May 7, 2025 at 4:14 AM Gang Wu wrote: > >> The clarification is simple and clear from the writer's perspective. >> >> CMIW, the implication is that reader should drop bbox with any NaN value >> regardl

Re: [VOTE] Minor clarification for Geo Spec

2025-05-07 Thread Amogh Jahagirdar
+1 (binding) On Wed, May 7, 2025 at 6:32 AM Eduard Tudenhöfner wrote: > +1 (binding) > > On Wed, May 7, 2025 at 4:14 AM Gang Wu wrote: > >> The clarification is simple and clear from the writer's perspective. >> >> CMIW, the implication is that reader should drop bbox with any NaN value >> rega

Re: [VOTE] Minor clarification for Geo Spec

2025-05-07 Thread Eduard Tudenhöfner
+1 (binding) On Wed, May 7, 2025 at 4:14 AM Gang Wu wrote: > The clarification is simple and clear from the writer's perspective. > > CMIW, the implication is that reader should drop bbox with any NaN value > regardless of the coordinate axis (in case of a writer bug). > > On Wed, May 7, 2025 at

Re: [DISCUSS] FileFormat API proposal

2025-05-07 Thread Péter Váry
Hi everyone, The proposed API part is reviewed and ready to go. See: https://github.com/apache/iceberg/pull/12774 Thanks to everyone who reviewed it already! Many of you wanted to review, but I know that the time constraints are there for everyone. I still very much would like to hear your voices,

Re: [DISCUSS] Events Endpoint for IRC

2025-05-07 Thread Christian Thiel
Dear all, I worked the changes discussed in the last catalog sync into the Events proposal [1]. Those include: - Using request-id instead of transaction - A more flexible User (now called Actor) - Custom Operation type The specific diff compared to the last discussion can be best seen in my lates