Denis,

Nikolay was doing final changes and TC stabilization. I'm planning to do
final review this week, so hopefully we will merge the code soon.

-Val

On Mon, Dec 4, 2017 at 1:31 PM, Denis Magda <dma...@apache.org> wrote:

> Nikolay, Val,
>
> Since we agreed to release the feature without the strategy support, can
> the current integration meet the world in 2.4 release? Please chime in this
> conversation:
> http://apache-ignite-developers.2346864.n4.nabble.
> com/Time-and-scope-for-Apache-Ignite-2-4-td24987.html
>
> —
> Denis
>
> > On Nov 28, 2017, at 5:42 PM, Valentin Kulichenko <
> valentin.kuliche...@gmail.com> wrote:
> >
> > Denis,
> >
> > Agree. I will do the final review in next few days and merge the code.
> >
> > -Val
> >
> > On Tue, Nov 28, 2017 at 5:28 PM, Denis Magda <dma...@apache.org> wrote:
> >
> >> Guys,
> >>
> >> Looking into the parallel discussion about the strategy support I would
> >> change my initial stance and support the idea of releasing the
> integration
> >> in its current state. Is the code ready to be merged into the master?
> Let’s
> >> concentrate on this first and handle the strategy support as a separate
> >> JIRA task. Agree?
> >>
> >> —
> >> Denis
> >>
> >>> On Nov 27, 2017, at 3:47 PM, Valentin Kulichenko <
> >> valentin.kuliche...@gmail.com> wrote:
> >>>
> >>> Nikolay,
> >>>
> >>> Let's estimate the strategy implementation work, and then decide
> weather
> >> to
> >>> merge the code in current state or not. If anything is unclear, please
> >>> start a separate discussion.
> >>>
> >>> -Val
> >>>
> >>> On Fri, Nov 24, 2017 at 5:42 AM, Николай Ижиков <
> nizhikov....@gmail.com>
> >>> wrote:
> >>>
> >>>> Hello, Val, Denis.
> >>>>
> >>>>> Personally, I think that we should release the integration only after
> >>>> the strategy is fully supported.
> >>>>
> >>>> I see two major reason to propose merge of DataFrame API
> implementation
> >>>> without custom strategy:
> >>>>
> >>>> 1. My PR is relatively huge, already. From my experience of
> interaction
> >>>> with Ignite community - the bigger PR becomes, the more time of
> >> commiters
> >>>> required to review PR.
> >>>> So, I propose to move smaller, but complete steps here.
> >>>>
> >>>> 2. It is not clear for me what exactly includes "custom strategy and
> >>>> optimization".
> >>>> Seems, that additional discussion required.
> >>>> I think, I can put my thoughts on the paper and start discussion right
> >>>> after basic implementation is done.
> >>>>
> >>>>> Custom strategy implementation is actually very important for this
> >>>> integration.
> >>>>
> >>>> Understand and fully agreed.
> >>>> I'm ready to continue work in that area.
> >>>>
> >>>> 23.11.2017 02:15, Denis Magda пишет:
> >>>>
> >>>> Val, Nikolay,
> >>>>>
> >>>>> Personally, I think that we should release the integration only after
> >> the
> >>>>> strategy is fully supported. Without the strategy we don’t really
> >> leverage
> >>>>> from Ignite’s SQL engine and introduce redundant data movement
> between
> >>>>> Ignite and Spark nodes.
> >>>>>
> >>>>> How big is the effort to support the strategy in terms of the amount
> of
> >>>>> work left? 40%, 60%, 80%?
> >>>>>
> >>>>> —
> >>>>> Denis
> >>>>>
> >>>>> On Nov 22, 2017, at 2:57 PM, Valentin Kulichenko <
> >>>>>> valentin.kuliche...@gmail.com> wrote:
> >>>>>>
> >>>>>> Nikolay,
> >>>>>>
> >>>>>> Custom strategy implementation is actually very important for this
> >>>>>> integration. Basically, it will allow to create a SQL query for
> Ignite
> >>>>>> and
> >>>>>> execute it directly on the cluster. Your current implementation only
> >>>>>> adds a
> >>>>>> new DataSource which means that Spark will fetch data in its own
> >> memory
> >>>>>> first, and then do most of the work (like joins for example). Does
> it
> >>>>>> make
> >>>>>> sense to you? Can you please take a look at this and provide your
> >>>>>> thoughts
> >>>>>> on how much development is implied there?
> >>>>>>
> >>>>>> Current code looks good to me though and I'm OK if the strategy is
> >>>>>> implemented as a next step in a scope of separate ticket. I will do
> >> final
> >>>>>> review early next week and will merge it if everything is OK.
> >>>>>>
> >>>>>> -Val
> >>>>>>
> >>>>>> On Thu, Oct 19, 2017 at 7:29 AM, Николай Ижиков <
> >> nizhikov....@gmail.com>
> >>>>>> wrote:
> >>>>>>
> >>>>>> Hello.
> >>>>>>>
> >>>>>>> 3. IgniteCatalog vs. IgniteExternalCatalog. Why do we have two
> >> Catalog
> >>>>>>>>
> >>>>>>> implementations and what is the difference?
> >>>>>>>
> >>>>>>> IgniteCatalog removed.
> >>>>>>>
> >>>>>>> 5. I don't like that IgniteStrategy and IgniteOptimization have to
> be
> >>>>>>>>
> >>>>>>> set manually on SQLContext each time it's created....Is there any
> >> way to
> >>>>>>> automate this and improve usability?
> >>>>>>>
> >>>>>>> IgniteStrategy and IgniteOptimization are removed as it empty now.
> >>>>>>>
> >>>>>>> Actually, I think it makes sense to create a builder similar to
> >>>>>>>>
> >>>>>>> SparkSession.builder()...
> >>>>>>>
> >>>>>>> IgniteBuilder added.
> >>>>>>> Syntax looks like:
> >>>>>>>
> >>>>>>> ```
> >>>>>>> val igniteSession = IgniteSparkSession.builder()
> >>>>>>>   .appName("Spark Ignite catalog example")
> >>>>>>>   .master("local")
> >>>>>>>   .config("spark.executor.instances", "2")
> >>>>>>>   .igniteConfig(CONFIG)
> >>>>>>>   .getOrCreate()
> >>>>>>>
> >>>>>>> igniteSession.catalog.listTables().show()
> >>>>>>> ```
> >>>>>>>
> >>>>>>> Please, see updated PR - https://github.com/apache/
> ignite/pull/2742
> >>>>>>>
> >>>>>>> 2017-10-18 20:02 GMT+03:00 Николай Ижиков <nizhikov....@gmail.com
> >:
> >>>>>>>
> >>>>>>> Hello, Valentin.
> >>>>>>>>
> >>>>>>>> My answers is below.
> >>>>>>>> Dmitry, do we need to move discussion to Jira?
> >>>>>>>>
> >>>>>>>> 1. Why do we have org.apache.spark.sql.ignite package in our
> >> codebase?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> As I mentioned earlier, to implement and override Spark Catalog
> one
> >>>>>>>> have
> >>>>>>>> to use internal(private) Spark API.
> >>>>>>>> So I have to use package `org.spark.sql.***` to have access to
> >> private
> >>>>>>>> class and variables.
> >>>>>>>>
> >>>>>>>> For example, SharedState class that stores link to ExternalCatalog
> >>>>>>>> declared as `private[sql] class SharedState` - i.e. package
> private.
> >>>>>>>>
> >>>>>>>> Can these classes reside under org.apache.ignite.spark instead?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> No, as long as we want to have our own implementation of
> >>>>>>>> ExternalCatalog.
> >>>>>>>>
> >>>>>>>> 2. IgniteRelationProvider contains multiple constants which I
> guess
> >> are
> >>>>>>>>>
> >>>>>>>> some king of config options. Can you describe the purpose of each
> of
> >>>>>>>> them?
> >>>>>>>>
> >>>>>>>> I extend comments for this options.
> >>>>>>>> Please, see my commit [1] or PR HEAD:
> >>>>>>>>
> >>>>>>>> 3. IgniteCatalog vs. IgniteExternalCatalog. Why do we have two
> >> Catalog
> >>>>>>>>>
> >>>>>>>> implementations and what is the difference?
> >>>>>>>>
> >>>>>>>> Good catch, thank you!
> >>>>>>>> After additional research I founded that only
> IgniteExternalCatalog
> >>>>>>>> required.
> >>>>>>>> I will update PR with IgniteCatalog remove in a few days.
> >>>>>>>>
> >>>>>>>> 4. IgniteStrategy and IgniteOptimization are currently no-op. What
> >> are
> >>>>>>>>>
> >>>>>>>> our plans on implementing them? Also, what exactly is planned in
> >>>>>>>> IgniteOptimization and what is its purpose?
> >>>>>>>>
> >>>>>>>> Actually, this is very good question :)
> >>>>>>>> And I need advice from experienced community members here:
> >>>>>>>>
> >>>>>>>> `IgniteOptimization` purpose is to modify query plan created by
> >> Spark.
> >>>>>>>> Currently, we have one optimization described in IGNITE-3084 [2]
> by
> >>>>>>>> you,
> >>>>>>>> Valentin :) :
> >>>>>>>>
> >>>>>>>> “If there are non-Ignite relations in the plan, we should fall
> back
> >> to
> >>>>>>>> native Spark strategies“
> >>>>>>>>
> >>>>>>>> I think we can go little further and reduce join of two Ignite
> >> backed
> >>>>>>>> Data Frames into single Ignite SQL query. Currently, this feature
> is
> >>>>>>>> unimplemented.
> >>>>>>>>
> >>>>>>>> *Do we need it now? Or we can postpone it and concentrates on
> basic
> >>>>>>>> Data
> >>>>>>>> Frame and Catalog implementation?*
> >>>>>>>>
> >>>>>>>> `Strategy` purpose, as you correctly mentioned in [2], is
> transform
> >>>>>>>> LogicalPlan into physical operators.
> >>>>>>>> I don’t have ideas how to use this opportunity. So I think we
> don’t
> >>>>>>>> need
> >>>>>>>> IgniteStrategy.
> >>>>>>>>
> >>>>>>>> Can you or anyone else suggest some optimization strategy to speed
> >> up
> >>>>>>>> SQL
> >>>>>>>> query execution?
> >>>>>>>>
> >>>>>>>> 5. I don't like that IgniteStrategy and IgniteOptimization have to
> >> be
> >>>>>>>>>
> >>>>>>>> set manually on SQLContext each time it's created....Is there any
> >> way
> >>>>>>>> to
> >>>>>>>> automate this and improve usability?
> >>>>>>>>
> >>>>>>>> These classes added to `extraOptimizations` when one using
> >>>>>>>> IgniteSparkSession.
> >>>>>>>> As far as I know, there is no way to automatically add these
> >> classes to
> >>>>>>>> regular SparkSession.
> >>>>>>>>
> >>>>>>>> 6. What is the purpose of IgniteSparkSession? I see it's used in
> >>>>>>>>>
> >>>>>>>> IgniteCatalogExample but not in IgniteDataFrameExample, which is
> >>>>>>>> Confusing.
> >>>>>>>>
> >>>>>>>> DataFrame API is *public* Spark API. So anyone can provide
> >>>>>>>> implementation
> >>>>>>>> and plug it into Spark. That’s why IgniteDataFrameExample doesn’t
> >> need
> >>>>>>>> any
> >>>>>>>> Ignite specific session.
> >>>>>>>>
> >>>>>>>> Catalog API is *internal* Spark API. There is no way to plug
> custom
> >>>>>>>> catalog implementation into Spark [3]. So we have to use
> >>>>>>>> `IgniteSparkSession` that extends regular SparkSession and
> overrides
> >>>>>>>> links
> >>>>>>>> to `ExternalCatalog`.
> >>>>>>>>
> >>>>>>>> 7. To create IgniteSparkSession we first create IgniteContext. Is
> it
> >>>>>>>>>
> >>>>>>>> really needed? It looks like we can directly provide the
> >> configuration
> >>>>>>>> file; if IgniteSparkSession really requires IgniteContext, it can
> >>>>>>>> create it
> >>>>>>>> by itself under the hood.
> >>>>>>>>
> >>>>>>>> Actually, IgniteContext is base class for Ignite <-> Spark
> >> integration
> >>>>>>>> for now. So I tried to reuse it here. I like the idea to remove
> >>>>>>>> explicit
> >>>>>>>> usage of IgniteContext.
> >>>>>>>> Will implement it in a few days.
> >>>>>>>>
> >>>>>>>> Actually, I think it makes sense to create a builder similar to
> >>>>>>>>>
> >>>>>>>> SparkSession.builder()...
> >>>>>>>>
> >>>>>>>> Great idea! I will implement such builder in a few days.
> >>>>>>>>
> >>>>>>>> 9. Do I understand correctly that IgniteCacheRelation is for the
> >> case
> >>>>>>>>>
> >>>>>>>> when we don't have SQL configured on Ignite side?
> >>>>>>>>
> >>>>>>>> Yes, IgniteCacheRelation is Data Frame implementation for a
> >> key-value
> >>>>>>>> cache.
> >>>>>>>>
> >>>>>>>> I thought we decided not to support this, no? Or this is something
> >>>>>>>>> else?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> My understanding is following:
> >>>>>>>>
> >>>>>>>> 1. We can’t support automatic resolving key-value caches in
> >>>>>>>> *ExternalCatalog*. Because there is no way to reliably detect key
> >> and
> >>>>>>>> value
> >>>>>>>> classes.
> >>>>>>>>
> >>>>>>>> 2. We can support key-value caches in regular Data Frame
> >>>>>>>> implementation.
> >>>>>>>> Because we can require user to provide key and value classes
> >>>>>>>> explicitly.
> >>>>>>>>
> >>>>>>>> 8. Can you clarify the query syntax in
> >> IgniteDataFrameExample#nativeS
> >>>>>>>>>
> >>>>>>>> parkSqlFromCacheExample2?
> >>>>>>>>
> >>>>>>>> Key-value cache:
> >>>>>>>>
> >>>>>>>> key - java.lang.Long,
> >>>>>>>> value - case class Person(name: String, birthDate: java.util.Date)
> >>>>>>>>
> >>>>>>>> Schema of data frame for cache is:
> >>>>>>>>
> >>>>>>>> key - long
> >>>>>>>> value.name - string
> >>>>>>>> value.birthDate - date
> >>>>>>>>
> >>>>>>>> So we can select data from data from cache:
> >>>>>>>>
> >>>>>>>> SELECT
> >>>>>>>> key, `value.name`,  `value.birthDate`
> >>>>>>>> FROM
> >>>>>>>> testCache
> >>>>>>>> WHERE key >= 2 AND `value.name` like '%0'
> >>>>>>>>
> >>>>>>>> [1] https://github.com/apache/ignite/pull/2742/commits/faf3ed6fe
> >>>>>>>> bf417bc59b0519156fd4d09114c8da7
> >>>>>>>> [2] https://issues.apache.org/jira/browse/IGNITE-3084?focusedCom
> >>>>>>>> mentId=15794210&page=com.atlassian.jira.plugin.system.issuet
> >>>>>>>> abpanels:comment-tabpanel#comment-15794210
> >>>>>>>> [3] https://issues.apache.org/jira/browse/SPARK-17767?focusedCom
> >>>>>>>> mentId=15543733&page=com.atlassian.jira.plugin.system.issuet
> >>>>>>>> abpanels:comment-tabpanel#comment-15543733
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> 18.10.2017 04:39, Dmitriy Setrakyan пишет:
> >>>>>>>>
> >>>>>>>> Val, thanks for the review. Can I ask you to add the same comments
> >> to
> >>>>>>>> the
> >>>>>>>>
> >>>>>>>>> ticket?
> >>>>>>>>>
> >>>>>>>>> On Tue, Oct 17, 2017 at 3:20 PM, Valentin Kulichenko <
> >>>>>>>>> valentin.kuliche...@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Nikolay, Anton,
> >>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I did a high level review of the code. First of all, impressive
> >>>>>>>>>> results!
> >>>>>>>>>> However, I have some questions/comments.
> >>>>>>>>>>
> >>>>>>>>>> 1. Why do we have org.apache.spark.sql.ignite package in our
> >>>>>>>>>> codebase?
> >>>>>>>>>> Can
> >>>>>>>>>> these classes reside under org.apache.ignite.spark instead?
> >>>>>>>>>> 2. IgniteRelationProvider contains multiple constants which I
> >> guess
> >>>>>>>>>> are
> >>>>>>>>>> some king of config options. Can you describe the purpose of
> each
> >> of
> >>>>>>>>>> them?
> >>>>>>>>>> 3. IgniteCatalog vs. IgniteExternalCatalog. Why do we have two
> >>>>>>>>>> Catalog
> >>>>>>>>>> implementations and what is the difference?
> >>>>>>>>>> 4. IgniteStrategy and IgniteOptimization are currently no-op.
> What
> >>>>>>>>>> are
> >>>>>>>>>> our
> >>>>>>>>>> plans on implementing them? Also, what exactly is planned in
> >>>>>>>>>> IgniteOptimization and what is its purpose?
> >>>>>>>>>> 5. I don't like that IgniteStrategy and IgniteOptimization have
> >> to be
> >>>>>>>>>> set
> >>>>>>>>>> manually on SQLContext each time it's created. This seems to be
> >> very
> >>>>>>>>>> error
> >>>>>>>>>> prone. Is there any way to automate this and improve usability?
> >>>>>>>>>> 6. What is the purpose of IgniteSparkSession? I see it's used
> >>>>>>>>>> in IgniteCatalogExample but not in IgniteDataFrameExample, which
> >> is
> >>>>>>>>>> confusing.
> >>>>>>>>>> 7. To create IgniteSparkSession we first create IgniteContext.
> Is
> >> it
> >>>>>>>>>> really
> >>>>>>>>>> needed? It looks like we can directly provide the configuration
> >>>>>>>>>> file; if
> >>>>>>>>>> IgniteSparkSession really requires IgniteContext, it can create
> >> it by
> >>>>>>>>>> itself under the hood. Actually, I think it makes sense to
> create
> >> a
> >>>>>>>>>> builder
> >>>>>>>>>> similar to SparkSession.builder(), it would be good if our APIs
> >> here
> >>>>>>>>>> are
> >>>>>>>>>> consistent with Spark APIs.
> >>>>>>>>>> 8. Can you clarify the query syntax
> >>>>>>>>>> inIgniteDataFrameExample#nativeSparkSqlFromCacheExample2?
> >>>>>>>>>> 9. Do I understand correctly that IgniteCacheRelation is for the
> >> case
> >>>>>>>>>> when
> >>>>>>>>>> we don't have SQL configured on Ignite side? I thought we
> decided
> >>>>>>>>>> not to
> >>>>>>>>>> support this, no? Or this is something else?
> >>>>>>>>>>
> >>>>>>>>>> Thanks!
> >>>>>>>>>>
> >>>>>>>>>> -Val
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Oct 17, 2017 at 4:40 AM, Anton Vinogradov <
> >>>>>>>>>> avinogra...@gridgain.com>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> Sounds awesome.
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> I'll try to review API & tests this week.
> >>>>>>>>>>>
> >>>>>>>>>>> Val,
> >>>>>>>>>>> Your review still required :)
> >>>>>>>>>>>
> >>>>>>>>>>> On Tue, Oct 17, 2017 at 2:36 PM, Николай Ижиков <
> >>>>>>>>>>> nizhikov....@gmail.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Yes
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> 17 окт. 2017 г. 2:34 PM пользователь "Anton Vinogradov" <
> >>>>>>>>>>>> avinogra...@gridgain.com> написал:
> >>>>>>>>>>>>
> >>>>>>>>>>>> Nikolay,
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> So, it will be able to start regular spark and ignite
> clusters
> >>>>>>>>>>>>> and,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> using
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> peer classloading via spark-context, perform any DataFrame
> >> request,
> >>>>>>>>>>>>
> >>>>>>>>>>>>> correct?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On Tue, Oct 17, 2017 at 2:25 PM, Николай Ижиков <
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> nizhikov....@gmail.com>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hello, Anton.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> An example you provide is a path to a master *local* file.
> >>>>>>>>>>>>>> These libraries are added to the classpath for each remote
> >> node
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> running
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> submitted job.
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Please, see documentation:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/
> >>>>>>>>>>>>>> spark/SparkContext.html#addJar(java.lang.String)
> >>>>>>>>>>>>>> http://spark.apache.org/docs/latest/api/java/org/apache/
> >>>>>>>>>>>>>> spark/SparkContext.html#addFile(java.lang.String)
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> 2017-10-17 13:10 GMT+03:00 Anton Vinogradov <
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> avinogra...@gridgain.com
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>> :
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Nikolay,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> With Data Frame API implementation there are no
> requirements
> >> to
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>> any
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Ignite files on spark worker nodes.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> What do you mean? I see code like:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> spark.sparkContext.addJar(MAVEN_HOME +
> >>>>>>>>>>>>>>> "/org/apache/ignite/ignite-core/2.3.0-SNAPSHOT/ignite-
> >>>>>>>>>>>>>>> core-2.3.0-SNAPSHOT.jar")
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On Mon, Oct 16, 2017 at 5:22 PM, Николай Ижиков <
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> nizhikov....@gmail.com>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hello, guys.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I have created example application to run Ignite Data
> Frame
> >> on
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> standalone
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Spark cluster.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> With Data Frame API implementation there are no
> >> requirements to
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> have
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>> any
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Ignite files on spark worker nodes.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> I ran this application on the free dataset: ATP tennis
> match
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> statistics.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> data - https://github.com/nizhikov/atp_matches
> >>>>>>>>>>>>>>>> app - https://github.com/nizhikov/ignite-spark-df-example
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Valentin, do you have a chance to look at my changes?
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> 2017-10-12 6:03 GMT+03:00 Valentin Kulichenko <
> >>>>>>>>>>>>>>>> valentin.kuliche...@gmail.com
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> :
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Hi Nikolay,
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Sorry for delay on this, got a little swamped lately. I
> >> will
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> do
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>> my
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> best
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> to
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> review the code this week.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> -Val
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> On Mon, Oct 9, 2017 at 11:48 AM, Николай Ижиков <
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> nizhikov....@gmail.com>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hello, Valentin.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Did you have a chance to look at my changes?
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> Now I think I have done almost all required features.
> >>>>>>>>>>>>>>>>>> I want to make some performance test to ensure my
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> implementation
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>> work
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> properly with a significant amount of data.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> And I definitely need some feedback for my changes.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> 2017-10-09 18:45 GMT+03:00 Николай Ижиков <
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> nizhikov....@gmail.com
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>> :
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hello, guys.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> Which version of Spark do we want to use?
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 1. Currently, Ignite depends on Spark 2.1.0.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>    * Can be run on JDK 7.
> >>>>>>>>>>>>>>>>>>>    * Still supported: 2.1.2 will be released soon.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 2. Latest Spark version is 2.2.0.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>    * Can be run only on JDK 8+
> >>>>>>>>>>>>>>>>>>>    * Released Jul 11, 2017.
> >>>>>>>>>>>>>>>>>>>    * Already supported by huge vendors(Amazon for
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> example).
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>> Note that in IGNITE-3084 I implement some internal Spark
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> API.
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>> So It will take some effort to switch between Spark 2.1 and
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 2.2
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> 2017-09-27 2:20 GMT+03:00 Valentin Kulichenko <
> >>>>>>>>>>>>>>>>>>> valentin.kuliche...@gmail.com>:
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> I will review in the next few days.
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> -Val
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> On Tue, Sep 26, 2017 at 2:23 PM, Denis Magda <
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> dma...@apache.org
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> Hello Nikolay,
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> This is good news. Finally this capability is coming
> to
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Ignite.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Val, Vladimir, could you do a preliminary review?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Answering on your questions.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 1. Yardstick should be enough for performance
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> measurements.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>> As a
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Spark
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> user, I will be curious to know what’s the point of this
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> integration.
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> Probably we need to compare Spark + Ignite and Spark +
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> Hive
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>> or
> >>>>>>>>>>>>
> >>>>>>>>>>>> Spark +
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> RDBMS cases.
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> 2. If Spark community is reluctant let’s include the
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> module
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>> in
> >>>>>>>>>>>>
> >>>>>>>>>>>> ignite-spark integration.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> —
> >>>>>>>>>>>>>>>>>>>>> Denis
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> On Sep 25, 2017, at 11:14 AM, Николай Ижиков <
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> nizhikov....@gmail.com>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> wrote:
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Hello, guys.
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Currently, I’m working on integration between Spark
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>> Ignite
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> [1].
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>> For now, I implement following:
> >>>>>>>>>>>>>>>>>>>>>>   * Ignite DataSource implementation(
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> IgniteRelationProvider)
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>   * DataFrame support for Ignite SQL table.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>   * IgniteCatalog implementation for a transparent
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> resolving
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>> of
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> ignites
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> SQL tables.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Implementation of it can be found in PR [2]
> >>>>>>>>>>>>>>>>>>>>>> It would be great if someone provides feedback for a
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> prototype.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> I made some examples in PR so you can see how API
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> suppose
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>> to
> >>>>>>>>>>>>
> >>>>>>>>>>>> be
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> used [3].
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> [4].
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> I need some advice. Can you help me?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> 1. How should this PR be tested?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Of course, I need to provide some unit tests. But
> what
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> about
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>> scalability
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> tests, etc.
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Maybe we need some Yardstick benchmark or similar?
> >>>>>>>>>>>>>>>>>>>>>> What are your thoughts?
> >>>>>>>>>>>>>>>>>>>>>> Which scenarios should I consider in the first
> place?
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> 2. Should we provide Spark Catalog implementation
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> inside
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>> Ignite
> >>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>>> codebase?
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> A current implementation of Spark Catalog based on
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> *internal
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>> Spark
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> API*.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> Spark community seems not interested in making Catalog
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> API
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>> public
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>> or
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> including Ignite Catalog in Spark code base [5], [6].
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> *Should we include Spark internal API implementation
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> inside
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>> Ignite
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> code
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> base?*
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Or should we consider to include Catalog
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> implementation
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>> in
> >>>>>>>>>>>
> >>>>>>>>>>> some
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> external
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> module?
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> That will be created and released outside Ignite?(we
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> still
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>> can
> >>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> support
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>> and
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>> develop it inside Ignite community).
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> [1] https://issues.apache.org/
> jira/browse/IGNITE-3084
> >>>>>>>>>>>>>>>>>>>>>> [2] https://github.com/apache/ignite/pull/2742
> >>>>>>>>>>>>>>>>>>>>>> [3] https://github.com/apache/
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> ignite/pull/2742/files#diff-
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>> f4ff509cef3018e221394474775e0905
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> [4] https://github.com/apache/
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> ignite/pull/2742/files#diff-
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>> f2b670497d81e780dfd5098c5dd8a89c
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> [5] http://apache-spark-developers-list.1001551.n3.
> >>>>>>>>>>>>>>>>>>>>>> nabble.com/Spark-Core-Custom-
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> Catalog-Integration-between-
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>> Apache-Ignite-and-Apache-Spark-td22452.html
> >>>>>>>>>>>>
> >>>>>>>>>>>>> [6] https://issues.apache.org/jira/browse/SPARK-17767
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>>>>> Nikolay Izhikov
> >>>>>>>>>>>>>>>>>>>>>> nizhikov....@gmail.com
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>>> Nikolay Izhikov
> >>>>>>>>>>>>>>>>>>> nizhikov....@gmail.com
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>>>> Nikolay Izhikov
> >>>>>>>>>>>>>>>>>> nizhikov....@gmail.com
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> --
> >>>>>>>>>>>>>>>> Nikolay Izhikov
> >>>>>>>>>>>>>>>> nizhikov....@gmail.com
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> --
> >>>>>>>>>>>>>> Nikolay Izhikov
> >>>>>>>>>>>>>> nizhikov....@gmail.com
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>> Nikolay Izhikov
> >>>>>>> nizhikov....@gmail.com
> >>>>>>>
> >>>>>>>
> >>>>>
> >>
> >>
>
>

Reply via email to