[DISCUSS] Formalized File IO Properties

2024-07-10 Thread Xuanwo
Hello everyone I've been working on the iceberg-rust FileIO recently and have found it challenging to identify all the necessary IO properties we need to support. For instance, consider AWS S3. There are no documents specifying which properties are supported by S3. The only relevant documentat

Re: Building with JDK 21

2024-07-10 Thread Renjie Liu
I guess it depends on the generated class file version. On Wed, Jul 10, 2024 at 2:53 PM Manu Zhang wrote: > I suppose Iceberg built with Java 11/17/21 (at least one of them) can run > with Spark/Hadoop built with Java 8? > > On Wed, Jul 10, 2024 at 11:59 AM Steven Wu wrote: > >> +1 for dropping

Re: Building with JDK 21

2024-07-10 Thread Eduard Tudenhöfner
I'm generally in favor of dropping JDK 8 support and moving to JDK 21. The one blocker that is currently preventing us from doing that is Hive. What about deprecating the Hive stuff e.g. with 1.6.0 and dropping it with 1.7.0 and moving to the newer JDK in 1.7.0? Iceberg 1.6.0 would then be the las

Re: Building with JDK 21

2024-07-10 Thread Jean-Baptiste Onofré
Hi Piotr I already commented on the PR directly, so let me share here: I'm in favor of dropping Java8 and directly jumping to Java21. However, to do this jump, I would need to remove "old" modules, like Hive. I think it's totally acceptable on a new major version. I shared this in the thread about

Re: [VOTE] Fix property names in REST spec for statistics / partition statistics

2024-07-10 Thread Jean-Baptiste Onofré
+1 (non binding) Regards JB On Wed, Jul 10, 2024 at 5:35 AM Eduard Tudenhöfner wrote: > > Hey everyone, > > I propose to fix the property names in the REST spec for statistics / > partition statistics so that they are properly aligned with the table spec > and the implementation. > > Please vo

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-10 Thread Jean-Baptiste Onofré
Yes and no :) It's a beta feature. My point is that we can enabled GitHub Discussion easily, and depending of the timing, asf.yaml support will be added. Regards JB On Tue, Jul 9, 2024 at 7:49 AM Xuanwo wrote: > > Hi, > > > Regarding the discussion tab, it sounds good to me. It's pretty > strai

Re: [Vote] Deprecate oauth tokens endpoint

2024-07-10 Thread Jean-Baptiste Onofré
+1 (non binding) NB: a few comments in the PR should be "addressed", but it's OK. Regards JB On Mon, Jul 8, 2024 at 6:15 PM Robert Stupp wrote: > > Hi Everyone, > > I propose that we merge PR to "Deprecate oauth/tokens endpoint". > > The background and overall plan is discussed on this mailing

Re: Building with JDK 21

2024-07-10 Thread Piotr Findeisen
Hi, Thank you all for your comments and perspectives. Summing up so far: It is clear that dropping JDK 8 is imminent, but it is also inevitably painful for some users. We don't have precise date/version when Java 8 can be dropped (along with Hive module) and Iceberg 1.7 was proposed for that. We

Re: Building with JDK 21

2024-07-10 Thread Jean-Baptiste Onofré
Hi Piotr, Even if it might fail at the beginning, I think we can start Java21 on a profile/CI even if we still use Java8 by default. So, agree to: 1. Even Java21 in a profile without formatter for now), the default is still Java8 2. When Java8 is dropped and we enable Java21 by default, we also sw

Re: [Vote] Deprecate oauth tokens endpoint

2024-07-10 Thread Driesprong, Fokko
+1 (binding) Op wo 10 jul 2024 om 10:14 schreef Jean-Baptiste Onofré : > +1 (non binding) > > NB: a few comments in the PR should be "addressed", but it's OK. > > Regards > JB > > On Mon, Jul 8, 2024 at 6:15 PM Robert Stupp wrote: > > > > Hi Everyone, > > > > I propose that we merge PR to "Depre

Re: [Vote] Deprecate oauth tokens endpoint

2024-07-10 Thread roryqi
+1. Driesprong, Fokko 于2024年7月10日周三 16:29写道: > +1 (binding) > > Op wo 10 jul 2024 om 10:14 schreef Jean-Baptiste Onofré : > >> +1 (non binding) >> >> NB: a few comments in the PR should be "addressed", but it's OK. >> >> Regards >> JB >> >> On Mon, Jul 8, 2024 at 6:15 PM Robert Stupp wrote: >>

Re: [DISCUSS] Extend Snapshot Metadata Lifecycle

2024-07-10 Thread Péter Váry
> I believe DeleteOrphanFiles may be ok as is, because currently the logic walks down the reachable graph and marks those metadata files as 'not-orphan', so it should naturally walk these 'expired' snapshots as well. We need to keep the metadata files, but remove data files if they are not removed

Re: Building with JDK 21

2024-07-10 Thread Fokko Driesprong
Thanks Piotr for raising this and summing it up so far. The timing of deprecating is always hard, but it looks like there is a lot of traction within the Java data ecosystem to move to a later version: - Avro 1.12.0 will be JDK17+ - Spark 4.x will be JDK17+ - Arrow 18 will be JDK11+ -

Re: Building with JDK 21

2024-07-10 Thread Eduard Tudenhöfner
> > > It has a caveat (we can't run formatter on 21 and 8, and we need to choose >> one). > > > Would it format differently? I would go for 21 since that's the path > forward, but I'm also fine with JB's suggestion 👍 > >> >> Yeah it would format differently, because the underlying *google-java-form

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-10 Thread Fokko Driesprong
Thanks for raising this. I would also prefer discussions over a user mailing-list since it has a lower barrier. We could also first enable this on Iceberg-rust and evaluate it after a while to see the added value and then decide for Python and Java? WDYT? Kind regards, Fokko Op wo 10 jul 2024 om

Iceberg namespace operation behaviors

2024-07-10 Thread Yong Zhang
Hi all, When I was using the SupportsNamespaces to do the namespace create and exists check with other catalog services that implement the Iceberg REST API, I found it has different results with different catalog services. Iceberg has provided a standard API to do the namespace operations, but see

Re: [DISCUSS] Describing REST Server capabilities

2024-07-10 Thread Eduard Tudenhöfner
Hey everyone, I've added a few inline comments below. > Re: remote signing, I agree that it does not look like a server capability > that a client can / should discover. It is more like something that the > server instructs / configures the client to do. While a server can control this behavi

Re: [DISCUSS] Fix property names in REST spec for statistics / partition statistics

2024-07-10 Thread Fokko Driesprong
Hey everyone, I'm fine with a vote, it is a change to the spec indeed, but it is because of a discrepancy between the reference implementation and the spec, so therefore you can also see it as fixing a bug. Let me give some context around how this is done for PyIceberg and Iceberg-Rust. Clients

Re: [DISCUSS] Formalized File IO Properties

2024-07-10 Thread Fokko Driesprong
Hey Xuanwo, Thanks for raising this. - The S3 properties are largely covered under the S3FileIO page: https://iceberg.apache.org/docs/nightly/aws/#s3-fileio. But it looks like some important ones are missing indeed. I've raised an issue here

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-10 Thread Renjie Liu
I’m fine with enabling it in iceberg-rust first and see how it goes. On Wed, Jul 10, 2024 at 17:39 Fokko Driesprong wrote: > Thanks for raising this. I would also prefer discussions over a user > mailing-list since it has a lower barrier. We could also first enable this > on Iceberg-rust and eva

Re: Building with JDK 21

2024-07-10 Thread Piotr Findeisen
Hi, Thanks for additional feedback. It sounds like we're OK enabling builds and testing with JDK 21 with the caveats that formatter is off. The non-negotiable condition is that CI still checks format. I think this is exactly what this PR is doing, whi

Re: [VOTE] Fix property names in REST spec for statistics / partition statistics

2024-07-10 Thread Piotr Findeisen
+1 (non binding) On Wed, 10 Jul 2024 at 10:11, Jean-Baptiste Onofré wrote: > +1 (non binding) > > Regards > JB > > On Wed, Jul 10, 2024 at 5:35 AM Eduard Tudenhöfner > wrote: > > > > Hey everyone, > > > > I propose to fix the property names in the REST spec for statistics / > partition statisti

Re: [DISCUSS] Formalized File IO Properties

2024-07-10 Thread ndrluis
Hello Everyone, I was considering discussing the standardization of Iceberg properties, and I believe this thread could be a great place to start. I'm writing an Iceberg client in Elixir and using the Java, Python, and Rust implementations as references. However, I've had some difficulty determ

Re: Iceberg namespace operation behaviors

2024-07-10 Thread Renjie Liu
Hi, Yong: Thanks for reporting this. Do you think iceberg needs to standardize the REST spec to make sure the > logic is the same for using the SupportsNamespace interface? Yes, I think it's valuable. I remember @Jean-Baptiste Onofré had a proposal for TCK of rest catalog, and this case shows

Re: [DISCUSS] Formalized File IO Properties

2024-07-10 Thread Renjie Liu
Hi: +1 for standardizing iceberg properties. This will help to align different language implementations. On Wed, Jul 10, 2024 at 9:44 PM wrote: > Hello Everyone, > > I was considering discussing the standardization of Iceberg properties, > and I believe this thread could be a great place to sta

Re: [Vote] Deprecate oauth tokens endpoint

2024-07-10 Thread Renjie Liu
+1 (non binding) On Wed, Jul 10, 2024 at 4:35 PM roryqi wrote: > +1. > > Driesprong, Fokko 于2024年7月10日周三 16:29写道: > >> +1 (binding) >> >> Op wo 10 jul 2024 om 10:14 schreef Jean-Baptiste Onofré > >: >> >>> +1 (non binding) >>> >>> NB: a few comments in the PR should be "addressed", but it's OK.

Re: [VOTE] Fix property names in REST spec for statistics / partition statistics

2024-07-10 Thread Amogh Jahagirdar
+1 (non-binding) On Wed, Jul 10, 2024 at 7:16 AM Piotr Findeisen wrote: > +1 (non binding) > > On Wed, 10 Jul 2024 at 10:11, Jean-Baptiste Onofré > wrote: > >> +1 (non binding) >> >> Regards >> JB >> >> On Wed, Jul 10, 2024 at 5:35 AM Eduard Tudenhöfner >> wrote: >> > >> > Hey everyone, >> > >

Re: [DISCUSS] Formalized File IO Properties

2024-07-10 Thread Russell Spitzer
Sounds reasonable to me On Wed, Jul 10, 2024 at 9:28 AM Renjie Liu wrote: > Hi: > > +1 for standardizing iceberg properties. This will help to align different > language implementations. > > On Wed, Jul 10, 2024 at 9:44 PM wrote: > >> Hello Everyone, >> >> I was considering discussing the stand

Re: [DISCUSS] Describing REST Server capabilities

2024-07-10 Thread Dmitri Bourlatchkov
Re: remote signing, I agree that it does not look like a server capability > that a client can / should discover. It is more like something that the > server instructs / configures the client to do. While a server can control this behavior and instruct the client to use remote signing, technicall

Re: [VOTE] Fix property names in REST spec for statistics / partition statistics

2024-07-10 Thread Russell Spitzer
+1 On Wed, Jul 10, 2024 at 9:47 AM Amogh Jahagirdar <2am...@gmail.com> wrote: > +1 (non-binding) > > On Wed, Jul 10, 2024 at 7:16 AM Piotr Findeisen > wrote: > >> +1 (non binding) >> >> On Wed, 10 Jul 2024 at 10:11, Jean-Baptiste Onofré >> wrote: >> >>> +1 (non binding) >>> >>> Regards >>> JB >

Re: Building with JDK 21

2024-07-10 Thread Robert Stupp
+1 on deprecating Java 8 in the next "catchable" Iceberg release. +1 on removing Java 8 in Iceberg 2.0 +1 on _building_ for Java 11 in Iceberg 2.0 +1 on _building_ with Java 17 in Iceberg 2.0 I'd recommend to _build_ for Java 11 wherever possible (`javac --release 11`), because many users ar

Re: [VOTE] Fix property names in REST spec for statistics / partition statistics

2024-07-10 Thread Robert Stupp
+1 (nb) On 10.07.24 05:35, Eduard Tudenhöfner wrote: Hey everyone, I propose to fix the property names in the REST spec for statistics / partition statistics so that they are properly aligned with the table spec

Re: [VOTE] Fix property names in REST spec for statistics / partition statistics

2024-07-10 Thread Steve Zhang
+1 (non binding) Thanks, Steve Zhang > On Jul 10, 2024, at 1:10 AM, Jean-Baptiste Onofré wrote: > > +1 (non binding)

Re: [VOTE] Fix property names in REST spec for statistics / partition statistics

2024-07-10 Thread Jack Ye
+1 (binding) -Jack On Wed, Jul 10, 2024 at 9:00 AM Steve Zhang wrote: > +1 (non binding) > > Thanks, > Steve Zhang > > > > On Jul 10, 2024, at 1:10 AM, Jean-Baptiste Onofré wrote: > > +1 (non binding) > > >

Re: [Vote] Deprecate oauth tokens endpoint

2024-07-10 Thread Russell Spitzer
+1 On Wed, Jul 10, 2024 at 11:03 AM Russell Spitzer wrote: > +` > > On Wed, Jul 10, 2024 at 9:33 AM Renjie Liu > wrote: > >> +1 (non binding) >> >> On Wed, Jul 10, 2024 at 4:35 PM roryqi wrote: >> >>> +1. >>> >>> Driesprong, Fokko 于2024年7月10日周三 16:29写道: >>> +1 (binding) Op wo 1

Re: [Vote] Deprecate oauth tokens endpoint

2024-07-10 Thread Russell Spitzer
+` On Wed, Jul 10, 2024 at 9:33 AM Renjie Liu wrote: > +1 (non binding) > > On Wed, Jul 10, 2024 at 4:35 PM roryqi wrote: > >> +1. >> >> Driesprong, Fokko 于2024年7月10日周三 16:29写道: >> >>> +1 (binding) >>> >>> Op wo 10 jul 2024 om 10:14 schreef Jean-Baptiste Onofré >> >: >>> +1 (non binding)

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-10 Thread Russell Spitzer
I'm a fan of having more things on github if possible. I haven't used this feature but it sounds like it could be useful. On Wed, Jul 10, 2024 at 6:15 AM Renjie Liu wrote: > I’m fine with enabling it in iceberg-rust first and see how it goes. > > On Wed, Jul 10, 2024 at 17:39 Fokko Driesprong w

Re: [DISCUSS] Formalized File IO Properties

2024-07-10 Thread Alex Dutra
Hi, Also +1 on standardizing properties. I'm looking forward to this discussion topic. In particular, REST catalog properties imho should be standardized with the "rest." prefix, and REST auth properties should imho have a prefix like "rest.auth..", e.g. "rest.auth.oauth2.issuer-url". Thanks, Ale

Re: [Vote] Deprecate oauth tokens endpoint

2024-07-10 Thread Steve Zhang
+1 (non binding) Thanks, Steve Zhang > On Jul 10, 2024, at 7:31 AM, Renjie Liu wrote: > > +1 (non binding)

Re: Spark: Copy Table Action

2024-07-10 Thread Ajantha Bhat
> > For RemoveExpiredFiles, I'm admittedly a bit skeptical if it's required > since orphan file removal should be able to cleanup the files in the > copied table. Are we able to elaborate why there's a concern with removing > snapshots on the copied table and subsequently relying on orphan file > r

Re: Iceberg namespace operation behaviors

2024-07-10 Thread Ajantha Bhat
Thanks for initiating this discussion. I suggested moving it to the mailing list because the javadoc of SupportsNamespaces and the REST API spec don't clearly define how to handle missing parent namespaces. I'm generally +1 of

Re: [VOTE] Fix property names in REST spec for statistics / partition statistics

2024-07-10 Thread Ajantha Bhat
+1 (non-binding) - Ajantha On Wed, Jul 10, 2024 at 9:33 PM Jack Ye wrote: > +1 (binding) > > -Jack > > On Wed, Jul 10, 2024 at 9:00 AM Steve Zhang > wrote: > >> +1 (non binding) >> >> Thanks, >> Steve Zhang >> >> >> >> On Jul 10, 2024, at 1:10 AM, Jean-Baptiste Onofré >> wrote: >> >> +1 (non

Re: [DISCUSS] Formalized File IO Properties

2024-07-10 Thread ndrluis
I don't know what the recommended way to start standardizing is. We can start a proposal for each context or have one proposal to handle all. Suggested contexts to start with: - Rest Catalog - FileIO I believe that most of the other cases are supported by the configuration topic in the Table

Re: [VOTE] Fix property names in REST spec for statistics / partition statistics

2024-07-10 Thread Ryan Blue
+1 On Wed, Jul 10, 2024 at 10:09 AM Ajantha Bhat wrote: > +1 (non-binding) > > - Ajantha > > On Wed, Jul 10, 2024 at 9:33 PM Jack Ye wrote: > >> +1 (binding) >> >> -Jack >> >> On Wed, Jul 10, 2024 at 9:00 AM Steve Zhang >> wrote: >> >>> +1 (non binding) >>> >>> Thanks, >>> Steve Zhang >>> >>>

Re: [VOTE] Fix property names in REST spec for statistics / partition statistics

2024-07-10 Thread Péter Váry
+1 (non-binding - at least as a committer I think I have a non-binding vote :) ) On Wed, Jul 10, 2024, 20:14 Ryan Blue wrote: > +1 > > On Wed, Jul 10, 2024 at 10:09 AM Ajantha Bhat > wrote: > >> +1 (non-binding) >> >> - Ajantha >> >> On Wed, Jul 10, 2024 at 9:33 PM Jack Ye wrote: >> >>> +1 (bi

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-10 Thread Jack Ye
+1 for enabling it on iceberg-rust first. I am curious how search-engine friendly this is, that will be a huge plus if people can find these discussion contents. For people who have used it more like Piotr, do you know about this? I search Trino things quite a lot on Google, but rarely did I find

Meeting Minutes 2024-05-08

2024-07-10 Thread Brian Olsen
Hey Iceberg Nation, Here are the meeting minutes from last few meeting's minutes. I've had some adjustments after moving on from Tabular, thanks for bearing with me. Transcription/Recording https://youtu.be/ekR0HOvjvI4 Summary 0:18 Geo support proposal has been added, community feedback is requ

Meeting Minutes 2024-05-29

2024-07-10 Thread Brian Olsen
Hey Iceberg Nation, Here are the meeting minutes from last few meeting's minutes. I've had some adjustments after moving on from Tabular, thanks for bearing with me. Transcription/Recording https://youtu.be/5xkhGDfFvGU Summary 0:12 Iceberg Summit was a big success with great community particip

Meeting Minutes 2024-06-19

2024-07-10 Thread Brian Olsen
Hey Iceberg Nation, Here are the meeting minutes from last few meeting's minutes. I've had some adjustments after moving on from Tabular, thanks for bearing with me. Transcription/Recording https://youtu.be/j1GncDMj8HY Summary 0:11 Several contributors have switched employers, but remain commi

Meeting Minutes 2024-07-10

2024-07-10 Thread Brian Olsen
Hey Iceberg Nation, Here are the meeting minutes from last few meeting's minutes. I've had some adjustments after moving on from Tabular, thanks for bearing with me. Transcription/Recording https://youtu.be/jAWka8g0o7c Summary 0:11 Significant progress on geospatial support proposal, addressin

[VOTE] spec: remove the JSON spec for content file and file scan task sections

2024-07-10 Thread Steven Wu
Following the latest community guidelines, I would like to start a voting thread on removing the JSON spec for content file and file scan task. Here is the PR for the spec change [1] This was previously discussed in the dev mailing list [2]. While it is good to add the JSON serializer in iceberg-c

Re: [VOTE] spec: remove the JSON spec for content file and file scan task sections

2024-07-10 Thread Ryan Blue
+1 Thanks, Steven! On Wed, Jul 10, 2024 at 3:50 PM Steven Wu wrote: > Following the latest community guidelines, I would like to start a voting > thread on removing the JSON spec for content file and file scan task. Here > is the PR for the spec change [1] > > This was previously discussed in t

Re: [VOTE] spec: remove the JSON spec for content file and file scan task sections

2024-07-10 Thread Renjie Liu
+1 (non binding) On Thu, Jul 11, 2024 at 7:22 AM Ryan Blue wrote: > +1 > > Thanks, Steven! > > On Wed, Jul 10, 2024 at 3:50 PM Steven Wu wrote: > >> Following the latest community guidelines, I would like to start a voting >> thread on removing the JSON spec for content file and file scan task.

Re: [VOTE] spec: remove the JSON spec for content file and file scan task sections

2024-07-10 Thread Xuanwo
+1 non-binding. The ieceberg-rust project doesn't refer to this too. On Thu, Jul 11, 2024, at 09:54, Renjie Liu wrote: > +1 (non binding) > > On Thu, Jul 11, 2024 at 7:22 AM Ryan Blue wrote: >> +1 >> >> Thanks, Steven! >> >> On Wed, Jul 10, 2024 at 3:50 PM Steven Wu wrote: >>> Following the

Re: [VOTE] spec: remove the JSON spec for content file and file scan task sections

2024-07-10 Thread John Zhuge
+1 (non binding) John Zhuge On Wed, Jul 10, 2024 at 6:57 PM Xuanwo wrote: > +1 non-binding. > > The ieceberg-rust project doesn't refer to this too. > > On Thu, Jul 11, 2024, at 09:54, Renjie Liu wrote: > > +1 (non binding) > > On Thu, Jul 11, 2024 at 7:22 AM Ryan Blue > wrote: > > +1 > > Tha

Re: [DISCUSS] Enable the discussion tab for iceberg github repos

2024-07-10 Thread Jean-Baptiste Onofré
It sounds reasonable to me. We are using GH Discussions on some ASF projects and it works pretty well. Regards JB On Wed, Jul 10, 2024 at 11:39 AM Fokko Driesprong wrote: > > Thanks for raising this. I would also prefer discussions over a user > mailing-list since it has a lower barrier. We co

Re: [VOTE] spec: remove the JSON spec for content file and file scan task sections

2024-07-10 Thread Jean-Baptiste Onofré
+1 (non binding) Regards JB On Thu, Jul 11, 2024 at 12:50 AM Steven Wu wrote: > > Following the latest community guidelines, I would like to start a voting > thread on removing the JSON spec for content file and file scan task. Here is > the PR for the spec change [1] > > This was previously d

Re: [VOTE] spec: remove the JSON spec for content file and file scan task sections

2024-07-10 Thread Ajantha Bhat
+1 (non-binding) - Ajantha On Thu, Jul 11, 2024 at 11:02 AM Jean-Baptiste Onofré wrote: > +1 (non binding) > > Regards > JB > > On Thu, Jul 11, 2024 at 12:50 AM Steven Wu wrote: > > > > Following the latest community guidelines, I would like to start a > voting thread on removing the JSON spec

Re: [VOTE] spec: remove the JSON spec for content file and file scan task sections

2024-07-10 Thread Eduard Tudenhöfner
+1 (non-binding) On Thu, Jul 11, 2024 at 8:29 AM Ajantha Bhat wrote: > +1 (non-binding) > > - Ajantha > > On Thu, Jul 11, 2024 at 11:02 AM Jean-Baptiste Onofré > wrote: > >> +1 (non binding) >> >> Regards >> JB >> >> On Thu, Jul 11, 2024 at 12:50 AM Steven Wu wrote: >> > >> > Following the lat

Re: [DISCUSS] Describing REST Server capabilities

2024-07-10 Thread Eduard Tudenhöfner
Are there any other concerns with the proposal or should we start a VOTE thread? Eduard On Wed, Jul 10, 2024 at 5:20 PM Dmitri Bourlatchkov wrote: > Re: remote signing, I agree that it does not look like a server capability >> that a client can / should discover. It is more like something that