Re: [Discussion] Apache Iceberg Community Guideline - Initial Version

2024-07-12 Thread Justin Mclean
Hi, My recommendation would be to not to call them bylaws. Justin > On 12 Jul 2024, at 1:30 am, Jack Ye wrote: > > Update: > > Based on the conversations on the incubator list > (https://lists.apache.org/thread/5ojxny76fr7n1y0hs2rxhr55g1fgcsln), I have > updated the document title from "gui

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Micah Kornfield
I don't think this needs to hold up the PR but I think coming to a consensus on the exact set of types supported is worthwhile (and if the goal is to maintain the same set as specified by the Spark Variant type or if divergence is expected/allowed). From a fragmentation perspective it would be a s

Re: Core:support redis and http lock-manager

2024-07-12 Thread Amogh Jahagirdar
I have pretty similar concerns as Ryan. I don't think we should be adding any new locking implementations to the project since at the moment the only catalog which requires it for atomic operations is HadoopCatalog (and it doesn't even completely address all the cases). Every locking implementation

Re: Re: Re: Re: Refactor the code of HadoopTableOptions

2024-07-12 Thread Ryan Blue
FileIO purposely does not support a rename operation because we wanted to keep a minimal API that handled object stores correctly rather than using a FileSystem concept. While we may need some extensions outside of what the core provides for reading and writing tables, I think we still need to be c

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Russell Spitzer
Just talked with Aihua and he's working on the Spec PR now. We can get feedback there from everyone. On Fri, Jul 12, 2024 at 3:41 PM Ryan Blue wrote: > Good idea, but I'm hoping that we can continue to get their feedback in > parallel to getting the spec changes started. Piotr didn't seem to obj

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Ryan Blue
Good idea, but I'm hoping that we can continue to get their feedback in parallel to getting the spec changes started. Piotr didn't seem to object to the encoding from what I read of his comments. Hopefully he (and others) chime in here. On Fri, Jul 12, 2024 at 1:32 PM Russell Spitzer wrote: > I

Re: [VOTE] Release Apache Iceberg 1.6.0 RC0

2024-07-12 Thread Russell Spitzer
+1 - Checked all the normal thing (Rat, Tests, Build, Spark) On Fri, Jul 12, 2024 at 1:14 PM Dmitri Bourlatchkov wrote: > +1 (nb) > > I verified OAuth2 in the REST Catalog with Spark / Keycloak (client > secret) / Nessie. > > The token URI warning is prominently displayed, when `oauth2-server-ur

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Russell Spitzer
I just want to make sure we get Piotr and Peter on board as representatives of Flink and Trino engines. Also make sure we have anyone else chime in who has experience with Ray if possible. Spec changes feel like the right next step. On Fri, Jul 12, 2024 at 3:14 PM Ryan Blue wrote: > Okay, what

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Ryan Blue
Okay, what are the next steps here? This proposal has been out for quite a while and I don't see any major objections to using the Spark encoding. It's quite well designed and fits the need well. It can also be extended to support additional types that are missing if that's a priority. Should we m

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Russell Spitzer
That's fair, I'm sold on an Iceberg Module. On Fri, Jul 12, 2024 at 2:53 PM Ryan Blue wrote: > > Feels like eventually the encoding should land in parquet proper right? > > What about using it in ORC? I don't know where it should end up. Maybe > Iceberg should make a standalone module from it? >

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Ryan Blue
> Feels like eventually the encoding should land in parquet proper right? What about using it in ORC? I don't know where it should end up. Maybe Iceberg should make a standalone module from it? On Fri, Jul 12, 2024 at 12:38 PM Russell Spitzer wrote: > Feels like eventually the encoding should l

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Russell Spitzer
Feels like eventually the encoding should land in parquet proper right? I'm fine with us just copying into Iceberg though for the time being. On Fri, Jul 12, 2024 at 2:31 PM Ryan Blue wrote: > Oops, it looks like I missed where Aihua brought this up in his last email: > > > do we have an issue t

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Ryan Blue
Oops, it looks like I missed where Aihua brought this up in his last email: > do we have an issue to directly use Spark implementation in Iceberg? Yes, I think that we do have an issue using the Spark library. What do you think about a Java implementation in Iceberg? Ryan On Fri, Jul 12, 2024 a

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Ryan Blue
I raised the same point from Peter's email in a comment on the doc as well. There is a spark-variant_2.13 artifact that would be a much smaller scope than relying on large portions of Spark, but I even then I doubt that it is a good idea for Iceberg to depend on that because it is a Scala artifact

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Péter Váry
Hi Aihua, Long time no see :) Would this mean, that every engine which plans to support Variant data type needs to add Spark as a dependency? Like Flink/Trino/Hive etc? Thanks, Peter On Fri, Jul 12, 2024, 19:10 Aihua Xu wrote: > Thanks Ryan. > > Yeah. That's another reason we want to pursue Spa

[DISCUSS] Merging specification clarifications

2024-07-12 Thread Micah Kornfield
Hi, I have to open pull requests to clarify points on the specification [1][2]. I believe these both document current behavior and don't represent a specification change (and they were already discussed on the mailing list) But given the recent focus on spec update process, I wanted to ask if the

Re: [VOTE] Release Apache Iceberg 1.6.0 RC0

2024-07-12 Thread Dmitri Bourlatchkov
+1 (nb) I verified OAuth2 in the REST Catalog with Spark / Keycloak (client secret) / Nessie. The token URI warning is prominently displayed, when `oauth2-server-uri` is not configured. When the token URI is configured, the client secret flow works fine with Keycloak. Cheers, Dmitri. On Fri, J

Re: Core:support redis and http lock-manager

2024-07-12 Thread Ryan Blue
I think one of the main questions is whether we want to support locking strategies moving forward. These were needed in early catalogs that didn't have support for atomic operations (HadoopCatalog and GlueCatalog). Now, Glue supports atomic commits and we have been discouraging the use of HadoopCat

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Aihua Xu
Thanks Ryan. Yeah. That's another reason we want to pursue Spark encoding to keep compatibility for the open source engines. One more question regarding the encoding implementation: do we have an issue to directly use Spark implementation in Iceberg? Russell pointed out that Trino doesn't have

Re: Spark: Copy Table Action

2024-07-12 Thread Sumedh Sakdeo
This is a useful addition, I believe it is important to list down requirements for such an action in greater details, especially what is in scope and what is not. Some open questions that could be added to the requirements / non-requirements section are 1. Should the copied table registered

Re: [VOTE] Release Apache Iceberg 1.6.0 RC0

2024-07-12 Thread Piotr Findeisen
Hi, The release is probably good to go, but i didn't verify it, so no -1 nor +1 from me. Still, it would be awesome if we could include these PRs somewhat important for Trino https://github.com/apache/iceberg/pull/10691 (OOM fix, especially under concurrency or for tables with large numbers of f

Re: [VOTE] Release Apache Iceberg 1.6.0 RC0

2024-07-12 Thread Robert Stupp
+1 (nb) On 12.07.24 16:48, Jean-Baptiste Onofré wrote: Hi everyone, I propose that we release the following RC as the official Apache Iceberg 1.6.0 release. The commit ID is ed228f79cd3e569e04af8a8ab411811803bf3a29 * This corresponds to the tag: apache-iceberg-1.6.0-rc0 * https://github.com/ap

Re: [DISCUSS] Describing REST Server capabilities

2024-07-12 Thread Eduard Tudenhöfner
Let's remove the *remote-signing* capability for now and go with *tables / views / multi-table-commit*. As I mentioned earlier, we can always add it when there's a clear benefit. Eduard On Fri, Jul 12, 2024 at 5:09 PM Dmitri Bourlatchkov wrote: > After more thinking about the "remote signing" c

Re: [DISCUSS] Describing REST Server capabilities

2024-07-12 Thread Dmitri Bourlatchkov
After more thinking about the "remote signing" capability flag, I am still not sure it is actually useful for making decisions on the client side. Granted, the client may have s3.remote-signing-enabled=true set independently of the server and then use the remote signing call paths. However, in thi

Re: [Early Feedback] Variant and Subcolumnarization Support

2024-07-12 Thread Ryan Blue
Thanks, Aihua! I think that the encoding choice in the current doc is a good one. I went through the Spark encoding in detail and it looks like a better choice than the other candidate encodings for quickly accessing nested fields. Another reason to use the Spark type is that this is what Delta's

[VOTE] Release Apache Iceberg 1.6.0 RC0

2024-07-12 Thread Jean-Baptiste Onofré
Hi everyone, I propose that we release the following RC as the official Apache Iceberg 1.6.0 release. The commit ID is ed228f79cd3e569e04af8a8ab411811803bf3a29 * This corresponds to the tag: apache-iceberg-1.6.0-rc0 * https://github.com/apache/iceberg/commits/apache-iceberg-1.6.0-rc0 * https://g

Re: [DISCUSS] Describing REST Server capabilities

2024-07-12 Thread Eduard Tudenhöfner
> > But why does this logic not apply to per-endpoint versioning? Isn't it > also nice to just fail at client side instead of calling server and getting > a "generic 501"? Yes of course that would be nice, but that would be at the cost of having finer-grained capabilities which we want to avoid b