[jira] [Comment Edited] (IGNITE-7077) Spark Data Frame Support. Strategy to convert complete query to Ignite SQL

Nikolay Izhikov (JIRA) Thu, 05 Apr 2018 21:26:55 -0700

    [ 
https://issues.apache.org/jira/browse/IGNITE-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427949#comment-16427949
 ]


Nikolay Izhikov edited comment on IGNITE-7077 at 4/6/18 4:25 AM:
-----------------------------------------------------------------

[~vkulichenko]

First, thank you for paying attention to my task!

> I thought this task implied implementation of Strategy to convert Spark's 
> logical plan to physical plan that would be executed directly on Ignite as a 
> SQL query. 

1. LogicalPlan is the representation of SQL query. 

2. Leave of LogicalPlan is wrapper for {{IgniteSQLRealtion}}.

3. As far as I can see we don't need to provide some custom {{SparkStrategy}}. 
At least, for a query optimization described in this issue.
Because {{SparkStrategy}} contains Spark 'physical' operations: how to select 
and process data from the underlying data sources optimally.
This differs from {{LogicalPlan}} - tree representation of SQL query.

> Here I see the implementation of Optimization. Can you please clarify why is 
> that and what is the difference? How the current implementation work?

My implementation does the following:

1. Transform part(or the whole) LogicalPlan from the bottom to the up and 
pushes all possible SQL operator to the Ignite accumulator. 
If operator pushed to the accumulator then tree node are removed from the plan, 
because we execute it internally in Iginte.

2. If on some level unsupported operator founded(some UDF or similar) then this 
and upper layers of LogicalPlan remains unchanged. 

3. Create SQL query from accumulator. It will be executed directly in Ignite.

4. Replace accumulator with a `IgniteSQLAccumulatorRelation` which is relation 
containing resulting SQL query.

After step 4 we have a LogicalPlan that will execute all supported SQL operator 
directly in Ignite.

> Also I think we should add some examples demonstrating the new functionality.

If it works correctly then we don't need any new examples, because 
implementation details are hidden from the user.
Here what changed:

This how spark plan looks like *before* Ignite optimization:

{noformat}
== Analyzed Logical Plan ==
id: bigint, name: string
Sort [id#180L ASC NULLS FIRST], true
+- Project [id#180L, name#179]
   +- Filter (id#180L > cast(1 as bigint))
      +- SubqueryAlias city
         +- Relation[NAME#179,ID#180L] IgniteSQLRelation[table=CITY]
{noformat}

After optimization, we got following plan. Please note, that query inside 
IgniteSQLAccumulatorRelation identical to the query representation from the 
previous plan.

{noformat}
== Optimized Logical Plan ==
Relation[ID#180L,NAME#179] IgniteSQLAccumulatorRelation(columns=[ID, NAME], 
qry=SELECT ID, NAME FROM CITY WHERE id IS NOT NULL AND id > 1 ORDER BY id)
{noformat}

Please, see debug output of IgniteOptimization*Spec. {{IgniteOptimizationSpec}} 
or {{IgniteOptimizationStringFuncSpec}}, for example.
You can see additional examples of LogicalPlan transformation there.


was (Author: nizhikov):
[~vkulichenko]

First, thank you for paying attention to my task!

> I thought this task implied implementation of Strategy to convert Spark's 
> logical plan to physical plan that would be executed directly on Ignite as a 
> SQL query. 

1. LogicalPlan is the representation of SQL query. 

2. Leave of LogicalPlan is wrapper for {{IgniteSQLRealtion}}.

3. As far as I can see we don't need to provide some custom {{SparkStrategy}}. 
At least for a query optimization described in this issue.
Because {{SparkStrategy}} contains Spark 'physical' operations: how to select 
and process data from the underlying data sources optimally.
This differs from {{LogicalPlan}} - tree representation of SQL query.

> Here I see the implementation of Optimization. Can you please clarify why is 
> that and what is the difference? How the current implementation work?

My implementation does the following:

1. Transform part(or the whole) LogicalPlan from the bottom to the up and 
pushes all possible SQL operator to the Ignite accumulator. 
If operator pushed to the accumulator then tree node are removed from the plan, 
because we execute it internally in Iginte.

2. If on some level unsupported operator founded(some UDF or similar) then this 
and upper layers of LogicalPlan remains unchanged. 

3. Create SQL query from accumulator. It will be executed directly in Ignite.

4. Replace accumulator with a `IgniteSQLAccumulatorRelation` which is relation 
containing resulting SQL query.

After step 4 we have a LogicalPlan that will execute all supported SQL operator 
directly in Ignite.

> Also I think we should add some examples demonstrating the new functionality.

If it works correctly then we don't need any new examples, because 
implementation details are hidden from the user.
Here what changed:

This how spark plan looks like *before* Ignite optimization:

{noformat}
== Analyzed Logical Plan ==
id: bigint, name: string
Sort [id#180L ASC NULLS FIRST], true
+- Project [id#180L, name#179]
   +- Filter (id#180L > cast(1 as bigint))
      +- SubqueryAlias city
         +- Relation[NAME#179,ID#180L] IgniteSQLRelation[table=CITY]
{noformat}

After optimization, we got following plan. Please note, that query inside 
IgniteSQLAccumulatorRelation identical to the query representation from the 
previous plan.

{noformat}
== Optimized Logical Plan ==
Relation[ID#180L,NAME#179] IgniteSQLAccumulatorRelation(columns=[ID, NAME], 
qry=SELECT ID, NAME FROM CITY WHERE id IS NOT NULL AND id > 1 ORDER BY id)
{noformat}

Please, see debug output of IgniteOptimization*Spec. {{IgniteOptimizationSpec}} 
or {{IgniteOptimizationStringFuncSpec}}, for example.
You can see additional examples of LogicalPlan transformation there.

> Spark Data Frame Support. Strategy to convert complete query to Ignite SQL
> --------------------------------------------------------------------------
>
>                 Key: IGNITE-7077
>                 URL: https://issues.apache.org/jira/browse/IGNITE-7077
>             Project: Ignite
>          Issue Type: New Feature
>          Components: spark
>    Affects Versions: 2.3
>            Reporter: Nikolay Izhikov
>            Assignee: Nikolay Izhikov
>            Priority: Major
>              Labels: bigdata
>             Fix For: 2.5
>
>
> Basic support of Spark Data Frame for Ignite implemented in IGNITE-3084.
> We need to implement custom spark strategy that can convert whole Spark SQL 
> query to Ignite SQL Query if query consists of only Ignite tables.
> The strategy does nothing if spark query includes not only Ignite tables.
> Memsql implementation can be taken as an example - 
> https://github.com/memsql/memsql-spark-connector



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (IGNITE-7077) Spark Data Frame Support. Strategy to convert complete query to Ignite SQL

Reply via email to