Re: Iceberg View Spec Improvements

2024-10-20 Thread Walaa Eldin Moustafa
Hi Everyone, Thanks for all the discussion so far. I have created a PR to document the requirements https://github.com/apache/iceberg/pull/11365. Please feel free to review it or discuss further in the thread. Thanks, Walaa. On Fri, Oct 11, 2024 at 5:19 PM Walaa Eldin Moustafa wrote: > Hi Ben

Re: Iceberg View Spec Improvements

2024-10-11 Thread Walaa Eldin Moustafa
Hi Benny, > we don't need to list out such restrictions because they really depend on the setup I do not think this is correct. The restrictions do not depend on the setup. They rather dictate it. All restrictions discussed in this thread do that one way or the other. The single engine (Dremio)

Re: Iceberg View Spec Improvements

2024-10-11 Thread Benny Chow
Hi Russell Yes, you listed out the requirements to make the two Spark engines case work. Basically, it allows each engine to dynamically resolve the table identifiers under the correct catalog name. Hello Walla IMO, we don't need to list out such restrictions because they really depend on the s

Re: Iceberg View Spec Improvements

2024-10-11 Thread Walaa Eldin Moustafa
Benny, "Iceberg View Spec Improvements" includes documenting what is supported and what is not. You listed a few restrictions. Many of them are not documented on the current spec. Documenting them is what this thread is about. We are trying to reach a consensus on the necessary constraints (so we a

Re: Iceberg View Spec Improvements

2024-10-11 Thread Russell Spitzer
The two Spark engines case is the only case I'm stuck on. I'm not sure how you can define a view that works regardless of configuration unless you require that the catalog holding the view is the default catalog (which is a config) and you also only produce catalog-less identifiers. On Fri, Oct 11

Re: Iceberg View Spec Improvements

2024-10-11 Thread Benny Chow
Having spent some time testing Nessie views with multiple engines (Dremio + Spark) using different catalog names and different namespaces, I tend to agree with Dan and Amogh that the current view spec is fine. Unlike tables, I think when it comes to views, engines have to "work together" if they e

Re: Iceberg View Spec Improvements

2024-10-10 Thread Amogh Jahagirdar
I took another pass over the view spec and I believe that representations of identifiers and how resolution of references by engines should be performed is clear. So from my perspective, at the moment we do not need to change the view spec itself. I do acknowledge though that practically there ca

Re: Iceberg View Spec Improvements

2024-10-10 Thread Daniel Weeks
Russell, I think there are a few existing ways to support that. For example, if you exclude the default catalog and fully reference the table with .. most sql engines will interpret that correctly (for cross or known catalogs). Also, if you omit the catalog and use a just ., it must use the cata

Re: Iceberg View Spec Improvements

2024-10-10 Thread Walaa Eldin Moustafa
Hi Russel, Would this be a good candidate for a future version of the spec? Thanks, Walaa. On Thu, Oct 10, 2024 at 3:50 PM Russell Spitzer wrote: > I still have an issue with representations not having explicit ways of > incorporating the catalog name, I'm thinking about our potential future

Re: Iceberg View Spec Improvements

2024-10-10 Thread Russell Spitzer
I still have an issue with representations not having explicit ways of incorporating the catalog name, I'm thinking about our potential future situation where we want to return a view for Fine Grained Access policies. In that case won't the Catalog need to craft a representation that matches the co

Re: Iceberg View Spec Improvements

2024-10-10 Thread Walaa Eldin Moustafa
Thanks Dan. I am +1 for documenting unsupported configurations. On Thu, Oct 10, 2024 at 3:34 PM Daniel Weeks wrote: > Hey Walaa, > > I recognize the issue you're calling out but disagree there is an implicit > assumption in the spec. The spec clearly says how identifiers including > catalogs an

Re: Iceberg View Spec Improvements

2024-10-10 Thread Daniel Weeks
Hey Walaa, I recognize the issue you're calling out but disagree there is an implicit assumption in the spec. The spec clearly says how identifiers including catalogs and namespaces are represented/stored and how references need to be resolved. The idea that a catalog may not match is an environ

Re: Iceberg View Spec Improvements

2024-10-10 Thread Walaa Eldin Moustafa
Hi Dan, I think there are a few questions that we should solve to decide the path forward: ** Does the current spec contain implicit assumptions?* I think the answer is yes. I think this is also what Ryan indicated here [1]. ** Do these implicit assumptions make it difficult to adopt the spec or

Re: Iceberg View Spec Improvements

2024-10-10 Thread Daniel Weeks
Walaa, I just want to expand upon what Ryan said a little. The catalog naming issue was identified when we designed the view spec and we opted for simplicity as opposed to trying to solve for catalog name mapping as it really complicates the spec/implementation. There may be ways for implementat

Re: Iceberg View Spec Improvements

2024-10-09 Thread Walaa Eldin Moustafa
Thanks Ryan and everyone who left feedback on the doc. Let me clarify a few things. "Improving the spec" also includes making the implicit assumptions explicitly stated in the spec. Explicitly stating the assumptions is discussed under the "Portable table identifiers" section in the doc. I am onb

Re: Iceberg View Spec Improvements

2024-10-09 Thread rdb...@gmail.com
+1 for Steven's comment. There is already an implicit assumption that the catalog names are consistent across engines. The best practice is to not reference identifiers across catalogs, but there isn't much we can do about the assumption here without rewriting SQL to fully qualify identifiers. On

Re: Iceberg View Spec Improvements

2024-10-08 Thread Walaa Eldin Moustafa
Hi Steven, Assumption 1 in "Portable SQL table identifiers" states: *All engines resolve a fully specified SQL identifier x.y.z to the same storage table identifier b.c in the same catalog a.* I think this assumption encodes the 4th assumption you shared. Assuming "x.y.z" resolves to "b.c" in st

Re: Iceberg View Spec Improvements

2024-10-08 Thread Steven Wu
Walaa, it doesn't seem to me that the doc captured Russel's idea. there could be a new assumption 4. If the catalog name is part of the table identifier, it should be consistent across engines. catalog federation can achieve the normalization/standardization of the catalog names On Tue, Oct 8, 2

Re: Iceberg View Spec Improvements

2024-10-08 Thread Walaa Eldin Moustafa
Just opened Comment access to the doc. Link here again for convenience [1]. [1] https://docs.google.com/document/d/1e5orD_sBv0VlNNLZRgUtalVUllGuztnAGTtqo8J0UG8/edit Thanks, Walaa. On Tue, Oct 8, 2024 at 10:42 AM Walaa Eldin Moustafa wrote: > Thanks Steven! I think this fits in the framework o

Re: Iceberg View Spec Improvements

2024-10-08 Thread Walaa Eldin Moustafa
Thanks Steven! I think this fits in the framework of "portable table identifiers" in the doc. I have stated the assumptions that should be added to the Iceberg spec in that case (in the doc they are more abstract/generic than the version you shared). Would be great to provide your feedback on the a

Re: Iceberg View Spec Improvements

2024-10-08 Thread Steven Wu
I like to follow up on Russel's suggestion of using a federated catalog for resolving the catalog name/alias problem. I think Russel's idea is that the federated catalog standardizes the catalog names (for referencing). That could solve the problem. There are two cases/ (1) single catalog: there i