Re: [DISCUSS] Iceberg Rust Sync Meeting

2024-10-10 Thread Christian Thiel
+1 for rust sync. Thanks for the proposal Xuanwo. There are many open topics and alignment in the sync can help to clarify scopes and dependencies to move forward with iceberg-rust even faster. Time is good for me. Von: Kevin Liu Gesendet: Wednesday, October 9, 2

Re: [Discuss] Iceberg community maintaining the docker images

2024-10-10 Thread Ajantha Bhat
Thanks for the discussions. I will just focus on Docker image of the REST catalog TCK first. These are related PRs for the same. https://github.com/apache/iceberg/pull/11279 https://github.com/apache/iceberg/pull/11283 We still need Apache infra help for publishing the image in the Apache docker

Re: Iceberg View Spec Improvements

2024-10-10 Thread Amogh Jahagirdar
I took another pass over the view spec and I believe that representations of identifiers and how resolution of references by engines should be performed is clear. So from my perspective, at the moment we do not need to change the view spec itself. I do acknowledge though that practically there ca

Re: [VOTE] Table V3 Spec: Row Lineage

2024-10-10 Thread Steven Wu
+1 On Thu, Oct 10, 2024 at 2:52 PM Yufei Gu wrote: > +1 > Yufei > > > On Thu, Oct 10, 2024 at 3:47 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > >> +1, I've been reviewing this proposal/spec change for a bit and I think >> it's in a good state for the community to work on an implementation. >>

Re: [PROPOSAL] Partially Loading Metadata - LoadTable V2

2024-10-10 Thread Haizhou Zhao
Thanks Eduard and Dan, At this stage, my main goal is to check around the community whether this problem is worth solving. If I can get sufficient feedback, or better, even consensus from the community, then that lays down a good foundation to further progress this thread. Implementation details a

Re: Iceberg View Spec Improvements

2024-10-10 Thread Daniel Weeks
Russell, I think there are a few existing ways to support that. For example, if you exclude the default catalog and fully reference the table with .. most sql engines will interpret that correctly (for cross or known catalogs). Also, if you omit the catalog and use a just ., it must use the cata

Re: Iceberg View Spec Improvements

2024-10-10 Thread Walaa Eldin Moustafa
Hi Russel, Would this be a good candidate for a future version of the spec? Thanks, Walaa. On Thu, Oct 10, 2024 at 3:50 PM Russell Spitzer wrote: > I still have an issue with representations not having explicit ways of > incorporating the catalog name, I'm thinking about our potential future

Re: Iceberg View Spec Improvements

2024-10-10 Thread Russell Spitzer
I still have an issue with representations not having explicit ways of incorporating the catalog name, I'm thinking about our potential future situation where we want to return a view for Fine Grained Access policies. In that case won't the Catalog need to craft a representation that matches the co

Re: Iceberg View Spec Improvements

2024-10-10 Thread Walaa Eldin Moustafa
Thanks Dan. I am +1 for documenting unsupported configurations. On Thu, Oct 10, 2024 at 3:34 PM Daniel Weeks wrote: > Hey Walaa, > > I recognize the issue you're calling out but disagree there is an implicit > assumption in the spec. The spec clearly says how identifiers including > catalogs an

Re: Iceberg View Spec Improvements

2024-10-10 Thread Daniel Weeks
Hey Walaa, I recognize the issue you're calling out but disagree there is an implicit assumption in the spec. The spec clearly says how identifiers including catalogs and namespaces are represented/stored and how references need to be resolved. The idea that a catalog may not match is an environ

Re: [VOTE] Table V3 Spec: Row Lineage

2024-10-10 Thread Yufei Gu
+1 Yufei On Thu, Oct 10, 2024 at 3:47 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > +1, I've been reviewing this proposal/spec change for a bit and I think > it's in a good state for the community to work on an implementation. > > Thanks Russell for driving this! > > On Thu, Oct 10, 2024 at 3:

Re: [VOTE] Table V3 Spec: Row Lineage

2024-10-10 Thread Amogh Jahagirdar
+1, I've been reviewing this proposal/spec change for a bit and I think it's in a good state for the community to work on an implementation. Thanks Russell for driving this! On Thu, Oct 10, 2024 at 3:31 PM Jack Ye wrote: > +1, overall agree that we should add this! > > Best, > Jack Ye > > On Th

Re: Iceberg View Spec Improvements

2024-10-10 Thread Walaa Eldin Moustafa
Hi Dan, I think there are a few questions that we should solve to decide the path forward: ** Does the current spec contain implicit assumptions?* I think the answer is yes. I think this is also what Ryan indicated here [1]. ** Do these implicit assumptions make it difficult to adopt the spec or

Spec changes for deletion vectors

2024-10-10 Thread rdb...@gmail.com
Hi everyone, There seems to be broad agreement around Anton's proposal to use deletion vectors in Iceberg v3, so I've opened two PRs that update the spec with the proposed changes. The first, PR #11238 , adds a new Puffin blob type, delete-vector

Re: [VOTE] Table V3 Spec: Row Lineage

2024-10-10 Thread Jack Ye
+1, overall agree that we should add this! Best, Jack Ye On Thu, Oct 10, 2024 at 1:43 PM Daniel Weeks wrote: > +1 > > Thanks Russell! > > On Thu, Oct 10, 2024 at 6:57 AM Eduard Tudenhöfner < > etudenhoef...@apache.org> wrote: > >> I left a few comments on the proposal but I'm overall +1 on the

Re: [DISCUSS] Defining a concept of "externally owned" tables in the REST spec

2024-10-10 Thread Jack Ye
Thanks Dennis for raising this! I had a similar discussion last year [1] that I definitely want to discuss more. But I feel the main focus of this discussion is less about external tables, but more about federation vs notification. For this topic, I have 2 questions: (1) To support federation, wha

Re: [DISCUSS] REST: Refreshing vended credentials

2024-10-10 Thread Jack Ye
+1 for adding this in the REST spec. Glue has a similar API GetTemporaryGlueTableCredentials [1], which was introduced because of performance and also security reasons. For example, we don't want to propagate credentials across the compute nodes in the cluster, and each compute node needs to fetch

Re: [Discuss] Iceberg community maintaining the docker images

2024-10-10 Thread rdb...@gmail.com
I was specifically replying to this suggestion to add docker images for Trino and Spark: > I also envision the Iceberg community maintaining some quick-start Docker images, such as spark-iceberg-rest, Trino-iceberg-rest, among others. It sounds like we're mostly agreed that the Iceberg project it

Re: [VOTE] Table V3 Spec: Row Lineage

2024-10-10 Thread Daniel Weeks
+1 Thanks Russell! On Thu, Oct 10, 2024 at 6:57 AM Eduard Tudenhöfner wrote: > I left a few comments on the proposal but I'm overall +1 on the proposal > > On Thu, Oct 10, 2024 at 12:08 PM Jean-Baptiste Onofré > wrote: > >> +1 >> >> I did a review on the proposal and it looks good to me. >> >>

Re: [Discuss] Iceberg community maintaining the docker images

2024-10-10 Thread Jean-Baptiste Onofré
It's actually what I meant by REST Catalog docker image for test. Personally, I would not include any docker images in the Iceberg project (but more in the "iceberg" ecosystem, which is different from the project :)). However, if the community has a different view on that, no problem. Regards JB

Re: [PROPOSAL] Partially Loading Metadata - LoadTable V2

2024-10-10 Thread Daniel Weeks
Hey Haizhou, I think you've done a great job of capturing some of the metadata size related issues in the doc, but I would echo Eduard's comments that we should explore using the existing refs only loading first. This may require adding similar functionality for schemas/logs if we think that is a

Re: Iceberg View Spec Improvements

2024-10-10 Thread Daniel Weeks
Walaa, I just want to expand upon what Ryan said a little. The catalog naming issue was identified when we designed the view spec and we opted for simplicity as opposed to trying to solve for catalog name mapping as it really complicates the spec/implementation. There may be ways for implementat

Re: [Discuss] Iceberg community maintaining the docker images

2024-10-10 Thread Daniel Weeks
I think we should focus on the docker image for the test REST Catalog implementation. This is somewhat different from the TCK since it's used by the python/rust/go projects for testing the client side of the REST specification. As for the quickstart/example type images, I'm open to discussing wha

Re: [VOTE] Table V3 Spec: Row Lineage

2024-10-10 Thread Eduard Tudenhöfner
I left a few comments on the proposal but I'm overall +1 on the proposal On Thu, Oct 10, 2024 at 12:08 PM Jean-Baptiste Onofré wrote: > +1 > > I did a review on the proposal and it looks good to me. > > Regards > JB > > On Tue, Oct 8, 2024 at 3:55 PM Russell Spitzer > wrote: > > > > Hi Y'all! >

Re: [PROPOSAL] Partially Loading Metadata - LoadTable V2

2024-10-10 Thread Eduard Tudenhöfner
Hey Haizhou, thanks for working on that proposal. I think my main concern with the current proposal is that it adds quite a lot of complexity at a bunch of places, since you'd need to partially update *TableMetadata*. Additionally, it requires a new endpoint. An alternative to that would be to do

Re: [Discuss] Apache Iceberg 1.6.2 release because of Avro CVE ?

2024-10-10 Thread Ajantha Bhat
If it is already analyzed and not really applicable for Iceberg, we can wait for 1.7.0. Thanks. - Ajantha On Thu, Oct 10, 2024 at 3:41 PM Jean-Baptiste Onofré wrote: > Hi > > I did the security fix in Avro and I can say that Iceberg is not > really impacted and vulnerable. > I'm not against a 1

Re: [Discuss] Iceberg community maintaining the docker images

2024-10-10 Thread Ajantha Bhat
Yes, the PRs I mentioned are about running TCK as a docker container and keeping/maintaining that docker file in the Iceberg repo. I envisioned maintaining other docker images also because I am not sure about the roadmap of the ones in our quickstart

[DISCUSS] REST: Refreshing vended credentials

2024-10-10 Thread Eduard Tudenhöfner
Hey everyone, I'd like to propose a mechanism and changes in order to be able to refresh vended credentials for tables. Please find the proposal doc here . The proposal requires a spec change, which

Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

2024-10-10 Thread Eduard Tudenhöfner
I would prefer to advocate for the REST catalog in those examples/docs (similar to how the Spark quickstart example uses the REST catalog). The docs could then refer to the quickstart example to indicate what's required in terms of services to be start

Re: [Discuss] Iceberg community maintaining the docker images

2024-10-10 Thread Jean-Baptiste Onofré
Hi I think there's context missing here. I agree with Ryan that Iceberg should not provide any docker image or runtime things (we had the same discussion about REST server). However, my understanding is that this discussion is also related to the REST TCK. The TCK validation run needs a runtime,

Re: [Discuss] Replace Hadoop Catalog Examples with JDBC Catalog in Documentation

2024-10-10 Thread Jean-Baptiste Onofré
Hi As we are talking about "documentation" (quick start/readme), I would rather propose to use the REST catalog here instead of JDBC. As it's the catalog we "promote", I think it would be valuable for users to start with the "right thing". JDBC Catalog is interesting for quick test/started guide

Re: [Discuss] Apache Iceberg 1.6.2 release because of Avro CVE ?

2024-10-10 Thread Jean-Baptiste Onofré
Hi I did the security fix in Avro and I can say that Iceberg is not really impacted and vulnerable. I'm not against a 1.6.2 release, but as we discussed about Iceberg 1.7.0 by the end of October (see Russell's message a few days ago), maybe we can wait 1.7.0 ? Regards JB On Wed, Oct 9, 2024 at 8

Re: [VOTE] Table V3 Spec: Row Lineage

2024-10-10 Thread Jean-Baptiste Onofré
+1 I did a review on the proposal and it looks good to me. Regards JB On Tue, Oct 8, 2024 at 3:55 PM Russell Spitzer wrote: > > Hi Y'all! > > I think we are more or less in agreement on adding Row Lineage to the spec > apart from a few details which may change a bit during implementation. > B

Re: [DISCUSS] REST: Standardize vended credentials in Spec

2024-10-10 Thread Eduard Tudenhöfner
Based on recent discussions the feedback was that we don't want to have anything storage-specific in the OpenAPI spec (other than documenting the different storage configurations, which is handled by #10576 ). Therefore I've updated the PR and made it f

Re: [Discuss] Apache Iceberg 1.6.2 release because of Avro CVE ?

2024-10-10 Thread Manu Zhang
Hi Ajantha, There is a bug[1] in migration procedures (e.g. add_files) when the option `parallelism` is larger than 1. I've submitted a fix[2] against the main branch and would like to back-port to 1.6.x. [1] https://github.com/apache/iceberg/issues/11147 [2] https://github.com/apache/iceberg/pul