Hi Spark community, I’d like to restart the vote for the ViewCatalog design proposal (SPIP <https://docs.google.com/document/d/1XOxFtloiMuW24iqJ-zJnDzHl2KMxipTjJoxleJFz66A/edit?usp=sharing> ).
The proposal is to add a ViewCatalog interface that can be used to load, create, alter, and drop views in DataSourceV2. Please vote on the SPIP in the next 72 hours. Once it is approved, I’ll update the PR <https://github.com/apache/spark/pull/28147> for review. [ ] +1: Accept the proposal as an official SPIP [ ] +0 [ ] -1: I don’t think this is a good idea because … Thanks! On Fri, Jun 4, 2021 at 1:46 PM Walaa Eldin Moustafa <wa.moust...@gmail.com> wrote: > Considering the API aspect, the ViewCatalog API sounds like a good idea. A > view catalog will enable us to integrate Coral > <https://engineering.linkedin.com/blog/2020/coral> (our view SQL > translation and management layer) very cleanly to Spark. Currently we can > only do it by maintaining our special version of the HiveExternalCatalog. > Considering that views can be expanded syntactically without necessarily > invoking the analyzer, using a dedicated view API can make performance > better if performance is the concern. Further, a catalog can still be both > a table and view provider if it chooses to based on this design, so I do > not think we necessarily lose the ability of providing both. Looking > forward to more discussions on this and making views a powerful tool in > Spark. > > Thanks, > Walaa. > > > On Wed, May 26, 2021 at 9:54 AM John Zhuge <jzh...@apache.org> wrote: > >> Looks like we are running in circles. Should we have an online meeting to >> get this sorted out? >> >> Thanks, >> John >> >> On Wed, May 26, 2021 at 12:01 AM Wenchen Fan <cloud0...@gmail.com> wrote: >> >>> OK, then I'd vote for TableViewCatalog, because >>> 1. This is how Hive catalog works, and we need to migrate Hive catalog >>> to the v2 API sooner or later. >>> 2. Because of 1, TableViewCatalog is easy to support in the current >>> table/view resolution framework. >>> 3. It's better to avoid name conflicts between table and views at the >>> API level, instead of relying on the catalog implementation. >>> 4. Caching invalidation is always a tricky problem. >>> >>> On Tue, May 25, 2021 at 3:09 AM Ryan Blue <rb...@netflix.com.invalid> >>> wrote: >>> >>>> I don't think that it makes sense to discuss a different approach in >>>> the PR rather than in the vote. Let's discuss this now since that's the >>>> purpose of an SPIP. >>>> >>>> On Mon, May 24, 2021 at 11:22 AM John Zhuge <jzh...@apache.org> wrote: >>>> >>>>> Hi everyone, I’d like to start a vote for the ViewCatalog design >>>>> proposal (SPIP). >>>>> >>>>> The proposal is to add a ViewCatalog interface that can be used to >>>>> load, create, alter, and drop views in DataSourceV2. >>>>> >>>>> The full SPIP doc is here: >>>>> https://docs.google.com/document/d/1XOxFtloiMuW24iqJ-zJnDzHl2KMxipTjJoxleJFz66A/edit?usp=sharing >>>>> >>>>> Please vote on the SPIP in the next 72 hours. Once it is approved, >>>>> I’ll update the PR for review. >>>>> >>>>> [ ] +1: Accept the proposal as an official SPIP >>>>> [ ] +0 >>>>> [ ] -1: I don’t think this is a good idea because … >>>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>> >> >> -- >> John Zhuge >> > -- John Zhuge