Re: Spark+Ignite SQL syntax proposal

Denis Magda Thu, 05 Oct 2017 20:35:21 -0700

I tend to agree with Val that key-value support seems excessive. My suggestion 
is to consider Ignite as a SQL database for this specific integration 
implementing only relevant functionality.


—
Denis

> On Oct 5, 2017, at 5:41 PM, Valentin Kulichenko 
> <[email protected]> wrote:
> 
> Nikolay,
> 
> I don't think we need this, especially with this kind of syntax which is
> very confusing. Main use case for data frames is SQL, so let's concentrate
> on it. We should use Ignite's SQL engine capabilities as much as possible.
> If we see other use cases down the road, we can always support them.
> 
> -Val
> 
> On Thu, Oct 5, 2017 at 10:57 AM, Николай Ижиков <[email protected]>
> wrote:
> 
>> Hello, Valentin.
>> 
>> I implemented the ability to make Spark SQL Queries for both:
>> 
>> 1.  Ignite SQL Table. Internally table described by QueryEntity with meta
>> information about data.
>> 2.  Key-Value cache - regular Ignite cache without meta information about
>> stored data.
>> 
>> In the second case, we have to know which types cache stores.
>> So for this case, I propose use syntax I describe
>> 
>> 
>> 2017-10-05 20:45 GMT+03:00 Valentin Kulichenko <
>> [email protected]>:
>> 
>>> Nikolay,
>>> 
>>> I don't understand. Why do we require to provide key and value types in
>>> SQL? What is the issue you're trying to solve with this syntax?
>>> 
>>> -Val
>>> 
>>> On Thu, Oct 5, 2017 at 7:05 AM, Николай Ижиков <[email protected]>
>>> wrote:
>>> 
>>>> Hello, guys.
>>>> 
>>>> I’m working on IGNITE-3084 [1] “Spark Data Frames Support in Apache
>>> Ignite”
>>>> and have a proposal to discuss.
>>>> 
>>>> I want to provide a consistent way to query Ignite key-value caches
>> from
>>>> Spark SQL engine.
>>>> 
>>>> To implement it I have to determine java class for the key and value.
>>>> It required for calculating schema for a Spark Data Frame.
>>>> As far as I know, there is no meta information for key-value cache in
>>>> Ignite for now.
>>>> 
>>>> If a regular data source is used, a user can provide key class and
>> value
>>>> class throw options. Example:
>>>> 
>>>> ```
>>>> val df = spark.read
>>>>  .format(IGNITE)
>>>>  .option("config", CONFIG)
>>>>  .option("cache", CACHE_NAME)
>>>>  .option("keyClass", "java.lang.Long")
>>>>  .option("valueClass", "java.lang.String")
>>>>  .load()
>>>> 
>>>> df.printSchema()
>>>> 
>>>> df.createOrReplaceTempView("testCache")
>>>> 
>>>> val igniteDF = spark.sql("SELECT key, value FROM testCache WHERE key
>>> = 2
>>>> AND value like '%0'")
>>>> ```
>>>> 
>>>> But If we use Ignite implementation of Spark catalog we don’t want to
>>>> register existing caches by hand.
>>>> Anton Vinogradov proposes syntax that I personally like very much:
>>>> 
>>>> *Let’s use following table name for a key-value cache -
>>>> `cacheName[keyClass,valueClass]`*
>>>> 
>>>> Example:
>>>> 
>>>> ```
>>>> val df3 = igniteSession.sql("SELECT * FROM
>>>> `testCache[java.lang.Integer,java.lang.String]` WHERE key % 2 = 0")
>>>> 
>>>> df3.printSchema()
>>>> 
>>>> df3.show()
>>>> ```
>>>> 
>>>> Thoughts?
>>>> 
>>>> [1] https://issues.apache.org/jira/browse/IGNITE-3084
>>>> 
>>>> --
>>>> Nikolay Izhikov
>>>> [email protected]
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> Nikolay Izhikov
>> [email protected]
>>

Re: Spark+Ignite SQL syntax proposal

Reply via email to