The two Spark engines case is the only case I'm stuck on. I'm not sure how you can define a view that works regardless of configuration unless you require that the catalog holding the view is the default catalog (which is a config) and you also only produce catalog-less identifiers.
On Fri, Oct 11, 2024 at 10:08 AM Benny Chow <btc...@gmail.com> wrote: > Having spent some time testing Nessie views with multiple engines > (Dremio + Spark) using different catalog names and different namespaces, I > tend to agree with Dan and Amogh that the current view spec is fine. > Unlike tables, I think when it comes to views, engines have to "work > together" if they expect to share the views. Working together means: > > Providing multiple SQL representations > Not using engine specific operators or UDFs > Not using engine specific row column access policies > Not using engine specific role based access control features such as view > delegation (ex. query user vs view owner) > Not using fully qualified SQL identifiers when engines don't standardize > on catalog names > Using standardized catalog names if cross catalog joins are needed in view > SQLs > > Some of the above limitations also exist even for the same engine when you > have for example two Spark clusters pointing to the same catalog and each > cluster uses different catalog names. > > Best > Benny > > > On Thu, Oct 10, 2024 at 8:51 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > >> I took another pass over the view spec and I believe that >> representations of identifiers and how resolution of references by engines >> should be performed is clear. So from my perspective, at the moment we do >> not need to change the view spec itself. >> >> I do acknowledge though that practically there can be scenarios where >> catalog names are inconsistent across environments and this has led to >> confusion when developing the MV spec (I'm remembering based on last week's >> community sync). There are some recommendations so that implementations can >> address these inconsistencies in this thread already, but I don't think >> adding some more complexity to the view spec via some form of >> normalizing/mapping identifiers is worth it for these cases. I think in its >> current state it's a sufficient model for developing MVs, and shouldn't >> block progression on that. >> >> I'm +1 on adding an "unsupported configurations" clarification though, >> it's become clear to me that there's enough confusion around the >> implications of the SQL identifiers in the spec that it's worth calling it >> out. >> >> Thanks, >> >> Amogh Jahagirdar >> >> On Thu, Oct 10, 2024 at 5:08 PM Daniel Weeks <dwe...@apache.org> wrote: >> >>> Russell, >>> >>> I think there are a few existing ways to support that. For example, if >>> you exclude the default catalog and fully reference the table with >>> <catalog>.<db>.<table> most sql engines will interpret that correctly (for >>> cross or known catalogs). Also, if you omit the catalog and use a just >>> <db>.<table>, it must use the catalog in which the view is defined (per the >>> spec), which I think addresses your case. >>> >>> Server-side rewrite is possible, but I think we'd need to explore the >>> specific cases, which we'll probably need to do as we consider secure views >>> more closely. >>> >>> -Dan >>> >>> On Thu, Oct 10, 2024 at 3:59 PM Walaa Eldin Moustafa < >>> wa.moust...@gmail.com> wrote: >>> >>>> Hi Russel, >>>> >>>> Would this be a good candidate for a future version of the spec? >>>> >>>> Thanks, >>>> Walaa. >>>> >>>> >>>> On Thu, Oct 10, 2024 at 3:50 PM Russell Spitzer < >>>> russell.spit...@gmail.com> wrote: >>>> >>>>> I still have an issue with representations not having explicit ways of >>>>> incorporating the catalog name, I'm thinking about our potential future >>>>> situation where we want to return a view for Fine Grained Access policies. >>>>> In that case won't the Catalog need to craft a representation that matches >>>>> the configuration of the engine? Doesn't this mean the client will have to >>>>> tell the Catalog what its local name is? >>>>> >>>>> On Thu, Oct 10, 2024 at 5:34 PM Daniel Weeks <dwe...@apache.org> >>>>> wrote: >>>>> >>>>>> Hey Walaa, >>>>>> >>>>>> I recognize the issue you're calling out but disagree there is an >>>>>> implicit assumption in the spec. The spec clearly says how identifiers >>>>>> including catalogs and namespaces are represented/stored and how >>>>>> references >>>>>> need to be resolved. The idea that a catalog may not match is an >>>>>> environmental/infrastructure/configuration issue related to where they >>>>>> are >>>>>> being referenced from. >>>>>> >>>>>> If we think this is sufficiently confusing to people, I would be open >>>>>> to discussing an "unsupported configurations" callout, but I don't think >>>>>> this blocks work and am somewhat skeptical that it's necessary. >>>>>> >>>>>> -Dan >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Oct 10, 2024 at 2:47 PM Walaa Eldin Moustafa < >>>>>> wa.moust...@gmail.com> wrote: >>>>>> >>>>>>> Hi Dan, >>>>>>> >>>>>>> I think there are a few questions that we should solve to decide the >>>>>>> path forward: >>>>>>> >>>>>>> ** Does the current spec contain implicit assumptions?* >>>>>>> I think the answer is yes. I think this is also what Ryan indicated >>>>>>> here [1]. >>>>>>> >>>>>>> ** Do these implicit assumptions make it difficult to adopt the spec >>>>>>> or evolve it in the correct way?* >>>>>>> I think the answer is yes as well. MV design discussions became >>>>>>> quite complicated because most contributors had a different >>>>>>> understanding >>>>>>> of the spec compared to what it encodes as implicit assumptions (see >>>>>>> this >>>>>>> thread for an example [2] -- there are a few more). This >>>>>>> unaligned understanding could possibly lead to inaccurate designs and >>>>>>> potentially result in unneeded further constraints or unneeded >>>>>>> engineering >>>>>>> complexity. >>>>>>> >>>>>>> ** What are the implicit assumptions (in an ambiguous way)?* >>>>>>> I do not think the answer is clear to everyone, even at this point. >>>>>>> There have been a few variations of those assumptions in this thread >>>>>>> alone. >>>>>>> I think we should converge on a clear set of assumptions for everyone's >>>>>>> consumption. >>>>>>> >>>>>>> ** Should we add the assumptions explicitly to the spec?* >>>>>>> I think we definitely should. Adoption or extension of the spec will >>>>>>> be quite difficult if the assumptions are not clearly stated and are >>>>>>> interpreted differently by different contributors. >>>>>>> >>>>>>> Would be great to hear the community's feedback on whether they >>>>>>> agree with the answers to the above questions. >>>>>>> >>>>>>> [1] https://lists.apache.org/thread/s1hjnc163ny76smv2l0t2sxxn93s4595 >>>>>>> [2] https://lists.apache.org/thread/0wzowd15328rnwvotzcoo4jrdyrzlx91 >>>>>>> >>>>>>> Thanks, >>>>>>> Walaa. >>>>>>> >>>>>>