I tend to agree with Val that key-value support seems excessive. My suggestion is to consider Ignite as a SQL database for this specific integration implementing only relevant functionality.
— Denis > On Oct 5, 2017, at 5:41 PM, Valentin Kulichenko > <[email protected]> wrote: > > Nikolay, > > I don't think we need this, especially with this kind of syntax which is > very confusing. Main use case for data frames is SQL, so let's concentrate > on it. We should use Ignite's SQL engine capabilities as much as possible. > If we see other use cases down the road, we can always support them. > > -Val > > On Thu, Oct 5, 2017 at 10:57 AM, Николай Ижиков <[email protected]> > wrote: > >> Hello, Valentin. >> >> I implemented the ability to make Spark SQL Queries for both: >> >> 1. Ignite SQL Table. Internally table described by QueryEntity with meta >> information about data. >> 2. Key-Value cache - regular Ignite cache without meta information about >> stored data. >> >> In the second case, we have to know which types cache stores. >> So for this case, I propose use syntax I describe >> >> >> 2017-10-05 20:45 GMT+03:00 Valentin Kulichenko < >> [email protected]>: >> >>> Nikolay, >>> >>> I don't understand. Why do we require to provide key and value types in >>> SQL? What is the issue you're trying to solve with this syntax? >>> >>> -Val >>> >>> On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[email protected]> >>> wrote: >>> >>>> Hello, guys. >>>> >>>> I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache >>> Ignite” >>>> and have a proposal to discuss. >>>> >>>> I want to provide a consistent way to query Ignite key-value caches >> from >>>> Spark SQL engine. >>>> >>>> To implement it I have to determine java class for the key and value. >>>> It required for calculating schema for a Spark Data Frame. >>>> As far as I know, there is no meta information for key-value cache in >>>> Ignite for now. >>>> >>>> If a regular data source is used, a user can provide key class and >> value >>>> class throw options. Example: >>>> >>>> ``` >>>> val df = spark.read >>>> .format(IGNITE) >>>> .option("config", CONFIG) >>>> .option("cache", CACHE_NAME) >>>> .option("keyClass", "java.lang.Long") >>>> .option("valueClass", "java.lang.String") >>>> .load() >>>> >>>> df.printSchema() >>>> >>>> df.createOrReplaceTempView("testCache") >>>> >>>> val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key >>> = 2 >>>> AND value like '%0'") >>>> ``` >>>> >>>> But If we use Ignite implementation of Spark catalog we don’t want to >>>> register existing caches by hand. >>>> Anton Vinogradov proposes syntax that I personally like very much: >>>> >>>> *Let’s use following table name for a key-value cache - >>>> `cacheName[keyClass,valueClass]`* >>>> >>>> Example: >>>> >>>> ``` >>>> val df3 = igniteSession.sql("SELECT * FROM >>>> `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0") >>>> >>>> df3.printSchema() >>>> >>>> df3.show() >>>> ``` >>>> >>>> Thoughts? >>>> >>>> [1] https://issues.apache.org/jira/browse/IGNITE-3084 >>>> >>>> -- >>>> Nikolay Izhikov >>>> [email protected] >>>> >>> >> >> >> >> -- >> Nikolay Izhikov >> [email protected] >>
