Xiao, Please have a look at the pull requests and documents I've posted over the last few months.
If you still have questions about how you might plug in Glue, let me know and I can clarify. rb On Thu, Nov 29, 2018 at 2:56 PM Xiao Li <gatorsm...@gmail.com> wrote: > Ryan, > > Thanks for leading the discussion and sending out the memo! > > >> Xiao suggested that there are restrictions for how tables and functions >> interact. Because of this, he doesn’t think that separate TableCatalog and >> FunctionCatalog APIs are feasible. > > > Anything is possible. It depends on how we design the two interfaces. Now, > most parts are unknown to me without seeing the design. > > I think we need to see the user stories, and high-level design before > working on a small portion of Catalog federation. We do not need an > exhaustive design in the current stage, but we need to know how the new > proposal works. For example, how to plug in a new Hive metastore? How to > plug in a Glue? How do users implement a new external catalog without > adding any new data sources? Without knowing more details, it is hard to > say whether this TableCatalog can satisfy all the requirements. > > Cheers, > > Xiao > > > Ryan Blue <rb...@netflix.com.invalid> 于2018年11月29日周四 下午2:32写道: > >> Hi everyone, >> >> Here are my notes from last night’s sync. Some attendees that joined >> during discussion may be missing, since I made the list while we were >> waiting for people to join. >> >> If you have topic suggestions for the next sync, please start sending >> them to me. Thank you! >> >> *Attendees:* >> >> Ryan Blue >> John Zhuge >> Jamison Bennett >> Yuanjian Li >> Xiao Li >> stczwd >> Matt Cheah >> Wenchen Fan >> Genglian Wang >> Kevin Yu >> Maryann Xue >> Cody Koeninger >> Bruce Robbins >> Rohit Karlupia >> >> *Agenda:* >> >> - Follow-up issues or discussion on Wenchen’s PR #23086 >> - TableCatalog proposal >> - CatalogTableIdentifier >> >> *Notes:* >> >> - Discussion about PR #23086 >> - Where should the catalog API live since it needs to be >> accessible to catalyst rules, but the catalyst module is private? >> - Wenchen suggested creating a sql-api module for v2 API >> interfaces, making catalyst depend on it >> - Consensus was to use Wenchen’s suggestion >> - In discussion about #23086, Xiao asked how adding catalog to a >> table identifier will work >> - Background from Ryan: existing code paths use TableIdentifier >> and don’t expect a catalog portion. If an identifier with a catalog >> were >> passed to existing code, that code may use the default catalog not >> knowing >> that a different one was requested, which would be incorrect behavior. >> - Ryan: The proposal for CatalogTableIdentifier addresses this >> problem. TableIdentifier is used for identifiers that have no catalog >> set. >> By enforcing that requirement, passing a TableIdentifier to old code >> ensures that no catalogs leak into that code. This is also used when >> the >> catalog is set from context. For example, the TableCatalog API accepts >> only >> TableIdentifier because the catalog is already determined. >> - Xiao asked whether FunctionIdentifier needs to be updated in the >> same way as CatalogTableIdentifier. >> - Ryan: Yes, when a FunctionCatalog API is added >> - The remaining time was spent discussing whether the plan to >> incrementally replace the current catalog API will work. [Not great notes >> here, feel free to add your take in a reply] >> - Xiao suggested that there are restrictions for how tables and >> functions interact. Because of this, he doesn’t think that separate >> TableCatalog and FunctionCatalog APIs are feasible. >> - Wenchen and Ryan think that functions should be orthogonal to >> data sources >> - Matt and Ryan think that catalog design can be done >> incrementally as new interfaces (i.e. FunctionCatalog) are added and >> that >> the proposed TableCatalog does not preclude designing for Xiao’s >> concerns >> later >> - [I forget who] pointed out that there are restrictions in some >> databases for views from different sources >> - There was some discussion about when functions or views cannot >> be orthogonal. For example, where the code runs is important. Functions >> pushed to sources cannot necessarily be run on other sources and Spark >> functions cannot necessarily be pushed down to sources. >> - Xiao would like a full catalog replacement design, including >> views, databases, and functions and how they interact, before moving >> forward with the proposed TableCatalog API >> - Ryan [and Matt, I think] think that TableCatalog is compatible >> with future decisions and the best path forward is to build >> incrementally. >> An exhaustive design process blocks progress on v2. >> >> >> On Mon, Nov 26, 2018 at 2:54 PM Ryan Blue <rb...@netflix.com> wrote: >> >>> Hi everyone, >>> >>> I just sent out an invite for the next DSv2 community sync for >>> Wednesday, 28 Nov at 5PM PST. >>> >>> We have a few topics left over from last time to cover. A few people >>> wanted to cover catalog APIs, so I put two items on the agenda: >>> >>> - The TableCatalog proposal (and other catalog APIs) >>> - Using CatalogTableIdentifier to separate v1 and v2 code paths and >>> avoid unintended behavior changes >>> >>> As I noted in the summary last time, please send topics ahead of time so >>> we can get started more quickly. >>> >>> If you would like to be added to the google hangout invite, please let >>> me know and I’ll add you. Thanks! >>> >>> rb >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Ryan Blue Software Engineer Netflix