Nikolay, > With Data Frame API implementation there are no requirements to have any > Ignite files on spark worker nodes.
What do you mean? I see code like: spark.sparkContext.addJar(MAVEN_HOME + "/org/apache/ignite/ignite-core/2.3.0-SNAPSHOT/ignite-core-2.3.0-SNAPSHOT.jar") On Mon, Oct 16, 2017 at 5:22 PM, Николай Ижиков <nizhikov....@gmail.com> wrote: > Hello, guys. > > I have created example application to run Ignite Data Frame on standalone > Spark cluster. > With Data Frame API implementation there are no requirements to have any > Ignite files on spark worker nodes. > > I ran this application on the free dataset: ATP tennis match statistics. > > data - https://github.com/nizhikov/atp_matches > app - https://github.com/nizhikov/ignite-spark-df-example > > Valentin, do you have a chance to look at my changes? > > > 2017-10-12 6:03 GMT+03:00 Valentin Kulichenko < > valentin.kuliche...@gmail.com > >: > > > Hi Nikolay, > > > > Sorry for delay on this, got a little swamped lately. I will do my best > to > > review the code this week. > > > > -Val > > > > On Mon, Oct 9, 2017 at 11:48 AM, Николай Ижиков <nizhikov....@gmail.com> > > wrote: > > > >> Hello, Valentin. > >> > >> Did you have a chance to look at my changes? > >> > >> Now I think I have done almost all required features. > >> I want to make some performance test to ensure my implementation work > >> properly with a significant amount of data. > >> And I definitely need some feedback for my changes. > >> > >> > >> 2017-10-09 18:45 GMT+03:00 Николай Ижиков <nizhikov....@gmail.com>: > >> > >>> Hello, guys. > >>> > >>> Which version of Spark do we want to use? > >>> > >>> 1. Currently, Ignite depends on Spark 2.1.0. > >>> > >>> * Can be run on JDK 7. > >>> * Still supported: 2.1.2 will be released soon. > >>> > >>> 2. Latest Spark version is 2.2.0. > >>> > >>> * Can be run only on JDK 8+ > >>> * Released Jul 11, 2017. > >>> * Already supported by huge vendors(Amazon for example). > >>> > >>> Note that in IGNITE-3084 I implement some internal Spark API. > >>> So It will take some effort to switch between Spark 2.1 and 2.2 > >>> > >>> > >>> 2017-09-27 2:20 GMT+03:00 Valentin Kulichenko < > >>> valentin.kuliche...@gmail.com>: > >>> > >>>> I will review in the next few days. > >>>> > >>>> -Val > >>>> > >>>> On Tue, Sep 26, 2017 at 2:23 PM, Denis Magda <dma...@apache.org> > wrote: > >>>> > >>>> > Hello Nikolay, > >>>> > > >>>> > This is good news. Finally this capability is coming to Ignite. > >>>> > > >>>> > Val, Vladimir, could you do a preliminary review? > >>>> > > >>>> > Answering on your questions. > >>>> > > >>>> > 1. Yardstick should be enough for performance measurements. As a > Spark > >>>> > user, I will be curious to know what’s the point of this > integration. > >>>> > Probably we need to compare Spark + Ignite and Spark + Hive or > Spark + > >>>> > RDBMS cases. > >>>> > > >>>> > 2. If Spark community is reluctant let’s include the module in > >>>> > ignite-spark integration. > >>>> > > >>>> > — > >>>> > Denis > >>>> > > >>>> > > On Sep 25, 2017, at 11:14 AM, Николай Ижиков < > >>>> nizhikov....@gmail.com> > >>>> > wrote: > >>>> > > > >>>> > > Hello, guys. > >>>> > > > >>>> > > Currently, I’m working on integration between Spark and Ignite > [1]. > >>>> > > > >>>> > > For now, I implement following: > >>>> > > * Ignite DataSource implementation(IgniteRelationProvider) > >>>> > > * DataFrame support for Ignite SQL table. > >>>> > > * IgniteCatalog implementation for a transparent resolving of > >>>> ignites > >>>> > > SQL tables. > >>>> > > > >>>> > > Implementation of it can be found in PR [2] > >>>> > > It would be great if someone provides feedback for a prototype. > >>>> > > > >>>> > > I made some examples in PR so you can see how API suppose to be > >>>> used [3]. > >>>> > > [4]. > >>>> > > > >>>> > > I need some advice. Can you help me? > >>>> > > > >>>> > > 1. How should this PR be tested? > >>>> > > > >>>> > > Of course, I need to provide some unit tests. But what about > >>>> scalability > >>>> > > tests, etc. > >>>> > > Maybe we need some Yardstick benchmark or similar? > >>>> > > What are your thoughts? > >>>> > > Which scenarios should I consider in the first place? > >>>> > > > >>>> > > 2. Should we provide Spark Catalog implementation inside Ignite > >>>> codebase? > >>>> > > > >>>> > > A current implementation of Spark Catalog based on *internal Spark > >>>> API*. > >>>> > > Spark community seems not interested in making Catalog API public > or > >>>> > > including Ignite Catalog in Spark code base [5], [6]. > >>>> > > > >>>> > > *Should we include Spark internal API implementation inside Ignite > >>>> code > >>>> > > base?* > >>>> > > > >>>> > > Or should we consider to include Catalog implementation in some > >>>> external > >>>> > > module? > >>>> > > That will be created and released outside Ignite?(we still can > >>>> support > >>>> > and > >>>> > > develop it inside Ignite community). > >>>> > > > >>>> > > [1] https://issues.apache.org/jira/browse/IGNITE-3084 > >>>> > > [2] https://github.com/apache/ignite/pull/2742 > >>>> > > [3] https://github.com/apache/ignite/pull/2742/files#diff- > >>>> > > f4ff509cef3018e221394474775e0905 > >>>> > > [4] https://github.com/apache/ignite/pull/2742/files#diff- > >>>> > > f2b670497d81e780dfd5098c5dd8a89c > >>>> > > [5] http://apache-spark-developers-list.1001551.n3. > >>>> > > nabble.com/Spark-Core-Custom-Catalog-Integration-between- > >>>> > > Apache-Ignite-and-Apache-Spark-td22452.html > >>>> > > [6] https://issues.apache.org/jira/browse/SPARK-17767 > >>>> > > > >>>> > > -- > >>>> > > Nikolay Izhikov > >>>> > > nizhikov....@gmail.com > >>>> > > >>>> > > >>>> > >>> > >>> > >>> > >>> -- > >>> Nikolay Izhikov > >>> nizhikov....@gmail.com > >>> > >> > >> > >> > >> -- > >> Nikolay Izhikov > >> nizhikov....@gmail.com > >> > > > > > > > -- > Nikolay Izhikov > nizhikov....@gmail.com >