Re: [DISCUSS] REST: OAuth2 Authentication Guide

2024-11-01 Thread Christian Thiel
Thank you for your Feedback everyone! It would be great if we could get some more eyes from the community on the server-side token exchange section at the bottom of the Document. Are we aware of any OAuth2 secured implementations that provide tokens from the resource server to the client apart

Re: [DISCUSS] Change Behavior for SchemaUpdate.UnionByName

2024-11-01 Thread Rocco Varela
Thanks Russell and Fokko. I updated my PR with the suggested updates. Cheers, --Rocco On Fri, Nov 1, 2024 at 3:01 AM Fokko Driesprong wrote: > Hey Rocco, > > Thanks for raising this. I don't have any strong feelings about this, and > I agree with Russell that it should not throw an exception.

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-11-01 Thread Steven Wu
+1 (binding) Verified signature, checksum, license. Did Flink SQL local testing with the runtime jar. Didn't run build because Azure FileIO testing requires Docker environment. On Fri, Nov 1, 2024 at 5:02 AM Fokko Driesprong wrote: > Thanks Russel for running this release! > > +1 (binding) > >

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-01 Thread Steven Wu
Shani, That is a good point. It is certainly a limitation for the Flink job to track the inverted index internally (which is what I had in mind). It can't be shared/synchronized with other Flink jobs or other engines writing to the same table. Thanks, Steven On Fri, Nov 1, 2024 at 10:50 AM Shani

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-01 Thread Shani Elharrar
Even if Flink can create this state, it would have to be maintained against the Iceberg table, we wouldn't like duplicates (keys) if other systems / users update the table (e.g manual insert / updates using DML). Shani.On 1 Nov 2024, at 18:32, Steven Wu wrote:> Add support for inverted indexes to

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-01 Thread Steven Wu
Fundamentally, it is very difficult to write position deletes with concurrent writers and conflicts for batch jobs too, as the inverted index may become invalid/stale. The position deletes are created during the write phase. But conflicts are only detected at the commit stage. I assume the batch j

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-01 Thread Shani Elharrar
I understand how it makes sense for batch jobs, but it damages stream jobs, using equality deletes works much better for streaming (which have a strict SLA for delays), and in order to decrease the performance penalty - systems can rewrite the equality deletes to positional deletes. Shani.On 1 Nov

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-11-01 Thread Fokko Driesprong
Thanks Russel for running this release! +1 (binding) Checked signatures, checksum, licenses and did some local testing. Kind regards, Fokko Op do 31 okt 2024 om 17:47 schreef Russell Spitzer < russell.spit...@gmail.com>: > @Manu Zhang You are definitely right, I'll get > that in before I do

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-01 Thread Steven Wu
> Add support for inverted indexes to reduce the cost of position lookup. This is fairly tricky to implement for streaming use cases without an external system. Anton, that is also what I was saying earlier. In Flink, the inverted index of (key, committed data files) can be tracked in Flink state.

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-01 Thread Anton Okolnychyi
Steven, do you have any pointers? In particular, I am curious to learn where the state will be stored, whether it will be distributed, the lookup cost, how to incrementally maintain that index, etc. - Anton пт, 1 лист. 2024 р. о 17:32 Steven Wu пише: > > Add support for inverted indexes to redu

Re: [DISCUSS] Partial Metadata Loading

2024-11-01 Thread Dmitri Bourlatchkov
Hello All, This is an interesting discussion and I'd like to offer my perspective. When a REST Catalog is involved, the metadata is loaded and modified via the catalog API. So control over the metadata is delegated to the catalog. I'd argue that in this situation, catalogs should have the flexib

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-11-01 Thread Daniel Weeks
+1 (binding) Verified sigs/sums/license/build/test Also did some manual verification using spark and everything checks out. -Dan On Fri, Nov 1, 2024 at 10:52 AM Steven Wu wrote: > +1 (binding) > > Verified signature, checksum, license. Did Flink SQL local testing with > the runtime jar. > > D

Re: [DISCUSS] REST: OAuth2 Authentication Guide

2024-11-01 Thread Dmitri Bourlatchkov
Hi Christian, Thanks for pushing this initiative forward. I think it is quite useful. I added some rather minor comments to the doc. One bigger aspect of this, I guess, is that the doc currently talks about what clients should do. This is important, of course. However, if a client is able to obt

Re: [VOTE] Release Apache Iceberg 1.7.0 RC0

2024-11-01 Thread Amogh Jahagirdar
+1 (binding) Verified signature/checksum/license/build/test with JDK17 Thanks for driving the release Russell! Amogh Jahagirdar On Fri, Nov 1, 2024 at 12:36 PM Daniel Weeks wrote: > +1 (binding) > > Verified sigs/sums/license/build/test > > Also did some manual verification using spark and ev

Re: [DISCUSS] Change Behavior for SchemaUpdate.UnionByName

2024-11-01 Thread Fokko Driesprong
Hey Rocco, Thanks for raising this. I don't have any strong feelings about this, and I agree with Russell that it should not throw an exception. I guess there was no strong reason behind how it is today, but it's just because we leverage the UpdateSchema API, which raises an exception when doing

Re: [DISCUSS] - Deprecate Equality Deletes

2024-11-01 Thread Anton Okolnychyi
I was a bit skeptical when we were adding equality deletes, but nothing beats their performance during writes. We have to find an alternative before deprecating. We are doing a lot of work to improve streaming, like reducing the cost of commits, enabling a large (potentially infinite) number of sn