Hi Val, yes that's correct. I'd be happy to make the change to have the database reference the schema if Nikolay agrees. (I'll first need to do a bit of research into how to obtain the list of all available schemata...)
Thanks, Stuart. On Tue, Aug 21, 2018 at 9:43 PM, Valentin Kulichenko < valentin.kuliche...@gmail.com> wrote: > Stuart, > > Thanks for pointing this out, I was not aware that we use Spark database > concept this way. Actually, this confuses me a lot. As far as I understand, > catalog is created in the scope of a particular IgniteSparkSession, which > in turn is assigned to a particular IgniteContext and therefore single > Ignite client. If that's the case, I don't think it should be aware of > other Ignite clients that are connected to other clusters. This doesn't > look like correct behavior to me, not to mention that with this approach > having multiple databases would be a very rare case. I believe we should > get rid of this logic and use Ignite schema name as database name in > Spark's catalog. > > Nikolay, what do you think? > > -Val > > On Tue, Aug 21, 2018 at 8:17 AM Stuart Macdonald <stu...@stuwee.org> > wrote: > >> Nikolay, Val, >> >> The JDBC Spark datasource[1] -- as far as I can tell -- has no >> ExternalCatalog implementation, it just uses the database specified in the >> JDBC URL. So I don't believe there is any way to call listTables() or >> listDatabases() for JDBC provider. >> >> The Hive ExternalCatalog[2] makes the distinction between database and >> table using the actual database and table mechanisms built into the >> catalog, which is fine because Hive has the clear distinction and >> hierarchy >> of databases and tables. >> >> *However* Ignite already uses the "database" concept in the Ignite >> >> ExternalCatalog[3] to mean the name of an Ignite instance. So in Ignite we >> have instances containing schemas containing tables, and Spark only has >> the >> concept of databases and tables so it seems like either we ignore one of >> the three Ignite concepts or combine two of them into database or table. >> The current implementation in the pull request combines Ignite schema and >> table attributes into the Spark table attribute. >> >> Stuart. >> >> [1] >> https://github.com/apache/spark/blob/master/sql/core/ >> src/main/scala/org/apache/spark/sql/execution/ >> datasources/jdbc/JDBCRelation.scala >> [2] >> https://github.com/apache/spark/blob/master/sql/hive/ >> src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala >> [3] >> https://github.com/apache/ignite/blob/master/modules/ >> spark/src/main/scala/org/apache/spark/sql/ignite/ >> IgniteExternalCatalog.scala >> >> On Tue, Aug 21, 2018 at 9:31 AM, Nikolay Izhikov <nizhi...@apache.org> >> wrote: >> >> > Hello, Stuart. >> > >> > Can you do some research and find out how schema is handled in Data >> Frames >> > for a regular RDBMS such as Oracle, MySQL, etc? >> > >> > В Пн, 20/08/2018 в 15:37 -0700, Valentin Kulichenko пишет: >> > > Stuart, Nikolay, >> > > >> > > I see that the 'Table' class (returned by listTables method) has a >> > 'database' field. Can we use this one to report schema name? >> > > >> > > In any case, I think we should look into how this is done in data >> source >> > implementations for other databases. Any relational database has a >> notion >> > of schema, and I'm sure Spark integrations take this into account >> somehow. >> > > >> > > -Val >> > > >> > > On Mon, Aug 20, 2018 at 6:12 AM Nikolay Izhikov <nizhi...@apache.org> >> > wrote: >> > > > Hello, Stuart. >> > > > >> > > > Personally, I think we should change current tables naming and >> return >> > table in form of `schema.table`. >> > > > >> > > > Valentin, could you share your opinion? >> > > > >> > > > >> > > > В Пн, 20/08/2018 в 10:04 +0100, Stuart Macdonald пишет: >> > > > > Igniters, >> > > > > >> > > > > While reviewing the changes for IGNITE-9228 [1,2], Nikolay and I >> are >> > > > > discussing whether to introduce a change which may impact >> backwards >> > > > > compatibility; Nikolay suggested we take the discussion to this >> list. >> > > > > >> > > > > Ignite implements a custom Spark catalog which provides an API by >> > which >> > > > > Spark users can list the tables which are available in Ignite >> which >> > can be >> > > > > queried via Spark SQL. Currently that table name list includes >> just >> > the >> > > > > names of the tables, but IGNITE-9228 is introducing a change which >> > allows >> > > > > optional prefixing of schema names to table names to disambiguate >> > multiple >> > > > > tables with the same name in different schemas. For the "list >> > tables" API >> > > > > we therefore have two options: >> > > > > >> > > > > 1. List the tables using both their table names and >> schema-qualified >> > table >> > > > > names (eg. [ "myTable", "mySchema.myTable" ]) even though they are >> > the same >> > > > > underlying table. This retains backwards compatibility with users >> who >> > > > > expect "myTable" to appear in the catalog. >> > > > > 2. List the tables using only their schema-qualified names. This >> > eliminates >> > > > > duplication of names in the catalog but will potentially break >> > > > > compatibility with users who expect the table name in the catalog. >> > > > > >> > > > > With either option we will allow for Spark SQL SELECT statements >> to >> > use >> > > > > either table name or schema-qualified table names, this change >> would >> > purely >> > > > > impact the API which is used to list available tables. >> > > > > >> > > > > Any opinions would be welcome. >> > > > > >> > > > > Thanks, >> > > > > Stuart. >> > > > > >> > > > > [1] https://issues.apache.org/jira/browse/IGNITE-9228 >> > > > > [2] https://github.com/apache/ignite/pull/4551 >> > >> >