Re: Iceberg View Spec Improvements

Russell Spitzer Fri, 11 Oct 2024 08:25:55 -0700

The two Spark engines case is the only case I'm stuck on. I'm not sure how
you can define a view that works regardless of configuration unless you
require that the catalog holding the view is the default catalog (which is
a config) and you also only produce catalog-less identifiers.


On Fri, Oct 11, 2024 at 10:08 AM Benny Chow <btc...@gmail.com> wrote:

> Having spent some time testing Nessie views with multiple engines
> (Dremio + Spark) using different catalog names and different namespaces, I
> tend to agree with Dan and Amogh that the current view spec is fine.
> Unlike tables, I think when it comes to views, engines have to "work
> together" if they expect to share the views.  Working together means:
>
> Providing multiple SQL representations
> Not using engine specific operators or UDFs
> Not using engine specific row column access policies
> Not using engine specific role based access control features such as view
> delegation (ex. query user vs view owner)
> Not using fully qualified SQL identifiers when engines don't standardize
> on catalog names
> Using standardized catalog names if cross catalog joins are needed in view
> SQLs
>
> Some of the above limitations also exist even for the same engine when you
> have for example two Spark clusters pointing to the same catalog and each
> cluster uses different catalog names.
>
> Best
> Benny
>
>
> On Thu, Oct 10, 2024 at 8:51 PM Amogh Jahagirdar <2am...@gmail.com> wrote:
>
>>  I took another pass over the view spec and I believe that
>> representations of identifiers and how resolution of references by engines
>> should be performed is clear. So from my perspective, at the moment we do
>> not need to change the view spec itself.
>>
>> I do acknowledge though that practically there can be scenarios where
>> catalog names are inconsistent across environments and this has led to
>> confusion when developing the MV spec (I'm remembering based on last week's
>> community sync). There are some recommendations so that implementations can
>> address these inconsistencies in this thread already, but I don't think
>> adding some more complexity to the view spec via some form of
>> normalizing/mapping identifiers is worth it for these cases. I think in its
>> current state it's a sufficient model for developing MVs, and shouldn't
>> block progression on that.
>>
>> I'm +1 on adding an "unsupported configurations" clarification though,
>> it's become clear to me that there's enough confusion around the
>> implications of the SQL identifiers in the spec that it's worth calling it
>> out.
>>
>> Thanks,
>>
>> Amogh Jahagirdar
>>
>> On Thu, Oct 10, 2024 at 5:08 PM Daniel Weeks <dwe...@apache.org> wrote:
>>
>>> Russell,
>>>
>>> I think there are a few existing ways to support that.  For example, if
>>> you exclude the default catalog and fully reference the table with
>>> <catalog>.<db>.<table> most sql engines will interpret that correctly (for
>>> cross or known catalogs).  Also, if you omit the catalog and use a just
>>> <db>.<table>, it must use the catalog in which the view is defined (per the
>>> spec), which I think addresses your case.
>>>
>>> Server-side rewrite is possible, but I think we'd need to explore the
>>> specific cases, which we'll probably need to do as we consider secure views
>>> more closely.
>>>
>>> -Dan
>>>
>>> On Thu, Oct 10, 2024 at 3:59 PM Walaa Eldin Moustafa <
>>> wa.moust...@gmail.com> wrote:
>>>
>>>> Hi Russel,
>>>>
>>>> Would this be a good candidate for a future version of the spec?
>>>>
>>>> Thanks,
>>>> Walaa.
>>>>
>>>>
>>>> On Thu, Oct 10, 2024 at 3:50 PM Russell Spitzer <
>>>> russell.spit...@gmail.com> wrote:
>>>>
>>>>> I still have an issue with representations not having explicit ways of
>>>>> incorporating the catalog name, I'm thinking about our potential future
>>>>> situation where we want to return a view for Fine Grained Access policies.
>>>>> In that case won't the Catalog need to craft a representation that matches
>>>>> the configuration of the engine? Doesn't this mean the client will have to
>>>>> tell the Catalog what its local name is?
>>>>>
>>>>> On Thu, Oct 10, 2024 at 5:34 PM Daniel Weeks <dwe...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Hey Walaa,
>>>>>>
>>>>>> I recognize the issue you're calling out but disagree there is an
>>>>>> implicit assumption in the spec.  The spec clearly says how identifiers
>>>>>> including catalogs and namespaces are represented/stored and how 
>>>>>> references
>>>>>> need to be resolved.  The idea that a catalog may not match is an
>>>>>> environmental/infrastructure/configuration issue related to where they 
>>>>>> are
>>>>>> being referenced from.
>>>>>>
>>>>>> If we think this is sufficiently confusing to people, I would be open
>>>>>> to discussing an "unsupported configurations" callout, but I don't think
>>>>>> this blocks work and am somewhat skeptical that it's necessary.
>>>>>>
>>>>>> -Dan
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 10, 2024 at 2:47 PM Walaa Eldin Moustafa <
>>>>>> wa.moust...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Dan,
>>>>>>>
>>>>>>> I think there are a few questions that we should solve to decide the
>>>>>>> path forward:
>>>>>>>
>>>>>>> ** Does the current spec contain implicit assumptions?*
>>>>>>> I think the answer is yes. I think this is also what Ryan indicated
>>>>>>> here [1].
>>>>>>>
>>>>>>> ** Do these implicit assumptions make it difficult to adopt the spec
>>>>>>> or evolve it in the correct way?*
>>>>>>> I think the answer is yes as well. MV design discussions became
>>>>>>> quite complicated because most contributors had a different 
>>>>>>> understanding
>>>>>>> of the spec compared to what it encodes as implicit assumptions (see 
>>>>>>> this
>>>>>>> thread for an example [2] -- there are a few more). This
>>>>>>> unaligned understanding could possibly lead to inaccurate designs and
>>>>>>> potentially result in unneeded further constraints or unneeded 
>>>>>>> engineering
>>>>>>> complexity.
>>>>>>>
>>>>>>> ** What are the implicit assumptions (in an ambiguous way)?*
>>>>>>> I do not think the answer is clear to everyone, even at this point.
>>>>>>> There have been a few variations of those assumptions in this thread 
>>>>>>> alone.
>>>>>>> I think we should converge on a clear set of assumptions for everyone's
>>>>>>> consumption.
>>>>>>>
>>>>>>> ** Should we add the assumptions explicitly to the spec?*
>>>>>>> I think we definitely should. Adoption or extension of the spec will
>>>>>>> be quite difficult if the assumptions are not clearly stated and are
>>>>>>> interpreted differently by different contributors.
>>>>>>>
>>>>>>> Would be great to hear the community's feedback on whether they
>>>>>>> agree with the answers to the above questions.
>>>>>>>
>>>>>>> [1] https://lists.apache.org/thread/s1hjnc163ny76smv2l0t2sxxn93s4595
>>>>>>> [2] https://lists.apache.org/thread/0wzowd15328rnwvotzcoo4jrdyrzlx91
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Walaa.
>>>>>>>
>>>>>>

Re: Iceberg View Spec Improvements

Reply via email to