Xiao,

Please have a look at the pull requests and documents I've posted over the
last few months.

If you still have questions about how you might plug in Glue, let me know
and I can clarify.

rb

On Thu, Nov 29, 2018 at 2:56 PM Xiao Li <gatorsm...@gmail.com> wrote:

> Ryan,
>
> Thanks for leading the discussion and sending out the memo!
>
>
>> Xiao suggested that there are restrictions for how tables and functions
>> interact. Because of this, he doesn’t think that separate TableCatalog and
>> FunctionCatalog APIs are feasible.
>
>
> Anything is possible. It depends on how we design the two interfaces. Now,
> most parts are unknown to me without seeing the design.
>
> I think we need to see the user stories, and high-level design before
> working on a small portion of Catalog federation. We do not need an
> exhaustive design in the current stage, but we need to know how the new
> proposal works. For example, how to plug in a new Hive metastore? How to
> plug in a Glue? How do users implement a new external catalog without
> adding any new data sources? Without knowing more details, it is hard to
> say whether this TableCatalog can satisfy all the requirements.
>
> Cheers,
>
> Xiao
>
>
> Ryan Blue <rb...@netflix.com.invalid> 于2018年11月29日周四 下午2:32写道:
>
>> Hi everyone,
>>
>> Here are my notes from last night’s sync. Some attendees that joined
>> during discussion may be missing, since I made the list while we were
>> waiting for people to join.
>>
>> If you have topic suggestions for the next sync, please start sending
>> them to me. Thank you!
>>
>> *Attendees:*
>>
>> Ryan Blue
>> John Zhuge
>> Jamison Bennett
>> Yuanjian Li
>> Xiao Li
>> stczwd
>> Matt Cheah
>> Wenchen Fan
>> Genglian Wang
>> Kevin Yu
>> Maryann Xue
>> Cody Koeninger
>> Bruce Robbins
>> Rohit Karlupia
>>
>> *Agenda:*
>>
>>    - Follow-up issues or discussion on Wenchen’s PR #23086
>>    - TableCatalog proposal
>>    - CatalogTableIdentifier
>>
>> *Notes:*
>>
>>    - Discussion about PR #23086
>>       - Where should the catalog API live since it needs to be
>>       accessible to catalyst rules, but the catalyst module is private?
>>       - Wenchen suggested creating a sql-api module for v2 API
>>       interfaces, making catalyst depend on it
>>       - Consensus was to use Wenchen’s suggestion
>>    - In discussion about #23086, Xiao asked how adding catalog to a
>>    table identifier will work
>>       - Background from Ryan: existing code paths use TableIdentifier
>>       and don’t expect a catalog portion. If an identifier with a catalog 
>> were
>>       passed to existing code, that code may use the default catalog not 
>> knowing
>>       that a different one was requested, which would be incorrect behavior.
>>       - Ryan: The proposal for CatalogTableIdentifier addresses this
>>       problem. TableIdentifier is used for identifiers that have no catalog 
>> set.
>>       By enforcing that requirement, passing a TableIdentifier to old code
>>       ensures that no catalogs leak into that code. This is also used when 
>> the
>>       catalog is set from context. For example, the TableCatalog API accepts 
>> only
>>       TableIdentifier because the catalog is already determined.
>>    - Xiao asked whether FunctionIdentifier needs to be updated in the
>>    same way as CatalogTableIdentifier.
>>       - Ryan: Yes, when a FunctionCatalog API is added
>>    - The remaining time was spent discussing whether the plan to
>>    incrementally replace the current catalog API will work. [Not great notes
>>    here, feel free to add your take in a reply]
>>       - Xiao suggested that there are restrictions for how tables and
>>       functions interact. Because of this, he doesn’t think that separate
>>       TableCatalog and FunctionCatalog APIs are feasible.
>>       - Wenchen and Ryan think that functions should be orthogonal to
>>       data sources
>>       - Matt and Ryan think that catalog design can be done
>>       incrementally as new interfaces (i.e. FunctionCatalog) are added and 
>> that
>>       the proposed TableCatalog does not preclude designing for Xiao’s 
>> concerns
>>       later
>>       - [I forget who] pointed out that there are restrictions in some
>>       databases for views from different sources
>>       - There was some discussion about when functions or views cannot
>>       be orthogonal. For example, where the code runs is important. Functions
>>       pushed to sources cannot necessarily be run on other sources and Spark
>>       functions cannot necessarily be pushed down to sources.
>>       - Xiao would like a full catalog replacement design, including
>>       views, databases, and functions and how they interact, before moving
>>       forward with the proposed TableCatalog API
>>       - Ryan [and Matt, I think] think that TableCatalog is compatible
>>       with future decisions and the best path forward is to build 
>> incrementally.
>>       An exhaustive design process blocks progress on v2.
>>
>>
>> On Mon, Nov 26, 2018 at 2:54 PM Ryan Blue <rb...@netflix.com> wrote:
>>
>>> Hi everyone,
>>>
>>> I just sent out an invite for the next DSv2 community sync for
>>> Wednesday, 28 Nov at 5PM PST.
>>>
>>> We have a few topics left over from last time to cover. A few people
>>> wanted to cover catalog APIs, so I put two items on the agenda:
>>>
>>>    - The TableCatalog proposal (and other catalog APIs)
>>>    - Using CatalogTableIdentifier to separate v1 and v2 code paths and
>>>    avoid unintended behavior changes
>>>
>>> As I noted in the summary last time, please send topics ahead of time so
>>> we can get started more quickly.
>>>
>>> If you would like to be added to the google hangout invite, please let
>>> me know and I’ll add you. Thanks!
>>>
>>> rb
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>

-- 
Ryan Blue
Software Engineer
Netflix

Reply via email to