Re: DataSourceV2 community sync #3

2018-12-03 Thread Thakrar, Jayesh
Thank you Ryan and Xiao – sharing all this info really gives a very good insight! From: Ryan Blue Reply-To: "rb...@netflix.com" Date: Monday, December 3, 2018 at 12:05 PM To: "Thakrar, Jayesh" Cc: Xiao Li , Spark Dev List Subject: Re: DataSourceV2 community sync #3 Ja

Re: DataSourceV2 community sync #3

2018-12-03 Thread Ryan Blue
esh" > *Cc: *Ryan Blue , "u...@spark.apache.org" < > dev@spark.apache.org> > *Subject: *Re: DataSourceV2 community sync #3 > > > > Hi, Jayesh, > > > > This is a good question. Spark is a unified analytics engine for various > data sources. We

Re: DataSourceV2 community sync #3

2018-12-03 Thread Thakrar, Jayesh
To: "Thakrar, Jayesh" Cc: Ryan Blue , "u...@spark.apache.org" Subject: Re: DataSourceV2 community sync #3 Hi, Jayesh, This is a good question. Spark is a unified analytics engine for various data sources. We are able to get the table schema from the underlying data sources

Re: DataSourceV2 community sync #3

2018-12-03 Thread Ryan Blue
park catalog be the common denominator of the other > catalogs (least featured) or a super-feature catalog? > > > > *From: *Xiao Li > *Date: *Saturday, December 1, 2018 at 10:49 PM > *To: *Ryan Blue > *Cc: *"u...@spark.apache.org" > *Subject: *Re: DataSourceV2 co

Re: DataSourceV2 community sync #3

2018-12-03 Thread Ryan Blue
Do you agree on my definition of catalog in Spark SQL? I think we agree on what a catalog is: A service that can manage the metadata and definitions of databases, views, tables, functions, roles, etc. external objects accessed through our data source APIs are called “tables”. I do not think we wi

Re: DataSourceV2 community sync #3

2018-12-03 Thread Xiao Li
d the Spark catalog be the common denominator of the other > catalogs (least featured) or a super-feature catalog? > > > > *From: *Xiao Li > *Date: *Saturday, December 1, 2018 at 10:49 PM > *To: *Ryan Blue > *Cc: *"u...@spark.apache.org" > *Subject: *

Re: DataSourceV2 community sync #3

2018-12-01 Thread Thakrar, Jayesh
Blue Cc: "u...@spark.apache.org" Subject: Re: DataSourceV2 community sync #3 Hi, Ryan, Let us first focus on answering the most fundamental problem before discussing various related topics. What is a catalog in Spark SQL? My definition of catalog is based on the database catalog. Basi

Re: DataSourceV2 community sync #3

2018-12-01 Thread Xiao Li
Hi, Ryan, Let us first focus on answering the most fundamental problem before discussing various related topics. What is a catalog in Spark SQL? My definition of catalog is based on the database catalog. Basically, the catalog provides a service that manage the metadata/definitions of database ob

Re: DataSourceV2 community sync #3

2018-12-01 Thread Ryan Blue
I try to avoid discussing each specific topic about the catalog federation before we deciding the framework of multi-catalog supports. I’ve tried to open discussions on this for the last 6+ months because we need it. I understand that you’d like a comprehensive plan for supporting more than one ca

Re: DataSourceV2 community sync #3

2018-12-01 Thread Xiao Li
Hi, Ryan, I try to avoid discussing each specific topic about the catalog federation before we deciding the framework of multi-catalog supports. - *CatalogTableIdentifier*: The PR https://github.com/apache/spark/pull/21978 is doing nothing but adding an interface. In the PR, we did not discuss h

Re: DataSourceV2 community sync #3

2018-12-01 Thread Ryan Blue
Xiao, I do have opinions about how multi-catalog support should work, but I don't think we are at a point where there is consensus. That's why I've started discussion threads and added the CatalogTableIdentifier PR instead of a comprehensive design doc. You have opinions about how users should int

Re: DataSourceV2 community sync #3

2018-12-01 Thread Xiao Li
Hi, Ryan, Catalog is a really important component for Spark SQL or any analytics platform, I have to emphasize. Thus, a careful design is needed to ensure it works as expected. Based on my previous discussion with many community members, Spark SQL needs a catalog interface so that we can mount mul

Re: DataSourceV2 community sync #3

2018-11-29 Thread Ryan Blue
Xiao, For the questions in this last email about how catalogs interact and how functions and other future features work: we discussed those last night. As I said then, I think that the right approach is incremental. We don’t want to design all of that in one gigantic proposal up front. To do that

Re: DataSourceV2 community sync #3

2018-11-29 Thread Xiao Li
Ryan, All the proposal I read is only related to Table metadata. Catalog contains the metadata of database, functions, columns, views, and so on. When we have multiple catalogs, how these catalogs interact with each other? How the global catalog works? How a view, table, function, database and col

Re: DataSourceV2 community sync #3

2018-11-29 Thread Ryan Blue
Xiao, Please have a look at the pull requests and documents I've posted over the last few months. If you still have questions about how you might plug in Glue, let me know and I can clarify. rb On Thu, Nov 29, 2018 at 2:56 PM Xiao Li wrote: > Ryan, > > Thanks for leading the discussion and se

Re: DataSourceV2 community sync #3

2018-11-29 Thread Xiao Li
Ryan, Thanks for leading the discussion and sending out the memo! > Xiao suggested that there are restrictions for how tables and functions > interact. Because of this, he doesn’t think that separate TableCatalog and > FunctionCatalog APIs are feasible. Anything is possible. It depends on how

Re: DataSourceV2 community sync #3

2018-11-29 Thread Ryan Blue
Hi everyone, Here are my notes from last night’s sync. Some attendees that joined during discussion may be missing, since I made the list while we were waiting for people to join. If you have topic suggestions for the next sync, please start sending them to me. Thank you! *Attendees:* Ryan Blue

Re: DataSourceV2 community sync #3

2018-11-28 Thread Xiao Li
Based on my understanding, we are not inventing anything new here. Basically, we are building a federated database system especially after we supporting multiple catalog. There are many mature commercial products in the market. For example, https://www.ibm.com/support/knowledgecenter/en/SSEPGG_10.5

Re: DataSourceV2 community sync #3

2018-11-28 Thread Wenchen Fan
Hi Ryan, Thanks for hosting the discussion! I think the table catalog is super useful, but since this is the first time we allow users to extend catalog, it's better to write down some details from end-user APIs to internal management. 1. How would end-users register/unregister catalog with SQL AP

Re: DataSourceV2 community sync #3

2018-11-27 Thread JackyLee
+1 Please add me to the Google Hangout invite. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: DataSourceV2 community sync #3

2018-11-27 Thread Martin Junghanns
Hi Ryan, I would like to be added to the Google Hangout invite. Thank you. Cheers, Martin On 26.11.18 23:54, Ryan Blue wrote: Hi everyone, I just sent out an invite for the next DSv2 community sync for Wednesday, 28 Nov at 5PM PST. We have a few topics left over from last time to cover.