Re: [DISCUSS] ViewCatalog interface for DSv2

John Zhuge Mon, 14 Oct 2019 09:04:10 -0700

Thanks for the feedback. I am preparing a doc and a PoC, will post soon.

On Mon, Oct 14, 2019 at 3:17 AM Wenchen Fan <cloud0...@gmail.com> wrote:


> I'm fine with the view definition proposed here, but my major concern is
> how to make sure table/view share the same namespace. According to the SQL
> spec, if there is a view named "a", we can't create a table named "a"
> anymore.
>
> We can add documents and ask the implementation to guarantee it, but it's
> better if this can be guaranteed by the API.
>
> On Wed, Aug 14, 2019 at 1:46 AM John Zhuge <jzh...@apache.org> wrote:
>
>> Thanks for the feedback, Ryan! I can share the WIP copy of the SPIP if
>> that makes sense.
>>
>> I can't find out a lot about view resolution and validation in SQL Spec
>> Part1. Anybody with full SQL knowledge may chime in.
>>
>> Here are my understanding based on online manuals, docs, and other
>> resources:
>>
>>    - A view has a name in the database schema so that other queries can
>>    use it like a table.
>>    - A view's schema is frozen at the time the view is created;
>>    subsequent changes to underlying tables (e.g. adding a column) will not be
>>    reflected in the view's schema. If an underlying table is dropped or
>>    changed in an incompatible fashion, subsequent attempts to query the
>>    invalid view will fail.
>>
>> In Preso, view columns are used for validation only (see
>> StatementAnalyzer.Visitor#isViewStale):
>>
>>    - view column names must match the visible fields of analyzed view sql
>>    - the visible fields can be coerced to view column types
>>
>> In Spark 2.2+, view columns are also used for validation (see
>> CheckAnalysis#checkAnalysis case View):
>>
>>    - view column names must match the output fields of the view sql
>>    - view column types must be able to UpCast to output field types
>>
>> Rule EliminateView adds a Project to viewQueryColumnNames if it exists.
>>
>> As for `softwareVersion`, the purpose is to track which software version
>> is used to create the view, in preparation for different versions of the
>> same software or even different softwares, such as Presto vs Spark.
>>
>>
>> On Tue, Aug 13, 2019 at 9:47 AM Ryan Blue <rb...@netflix.com> wrote:
>>
>>> Thanks for working on this, John!
>>>
>>> I'd like to see a more complete write-up of what you're proposing.
>>> Without that, I don't think we can have a productive discussion about this.
>>>
>>> For example, I think you're proposing to keep the view columns to ensure
>>> that the same columns are produced by the view every time, based on
>>> requirements from the SQL spec. Let's start by stating what those behavior
>>> requirements are, so that everyone has the context to understand why your
>>> proposal includes the view columns. Similarly, I'd like to know why you're
>>> proposing `softwareVersion` in the view definition.
>>>
>>> On Tue, Aug 13, 2019 at 8:56 AM John Zhuge <jzh...@apache.org> wrote:
>>>
>>>> Catalog support has been added to DSv2 along with a table catalog
>>>> interface. Here I'd like to propose a view catalog interface, for the
>>>> following benefit:
>>>>
>>>>    - Abstraction for view management thus allowing different view
>>>>    backends
>>>>    - Disassociation of view definition storage from Hive Metastore
>>>>
>>>> A catalog plugin can be both TableCatalog and ViewCatalog. Resolve an
>>>> identifier as view first then table.
>>>>
>>>> More details in SPIP and PR if we decide to proceed. Here is a quick
>>>> glance at the API:
>>>>
>>>> ViewCatalog interface:
>>>>
>>>>    - loadView
>>>>    - listViews
>>>>    - createView
>>>>    - deleteView
>>>>
>>>> View interface:
>>>>
>>>>    - name
>>>>    - originalSql
>>>>    - defaultCatalog
>>>>    - defaultNamespace
>>>>    - viewColumns
>>>>    - owner
>>>>    - createTime
>>>>    - softwareVersion
>>>>    - options (map)
>>>>
>>>> ViewColumn interface:
>>>>
>>>>    - name
>>>>    - type
>>>>
>>>>
>>>> Thanks,
>>>> John Zhuge
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>> --
>> John Zhuge
>>
>

-- 
John Zhuge

Re: [DISCUSS] ViewCatalog interface for DSv2

Reply via email to