Hi Nikolay, No, it is not possible to get this info from public API, neither we planned to expose it. See IGNITE-4509 and commit *fbf0e353* to get better understanding on how this was implemented.
Vladimir. On Wed, Nov 29, 2017 at 2:01 PM, Николай Ижиков <nizhikov....@gmail.com> wrote: > Hello, Vladimir. > > > partition pruning is already implemented in Ignite, so there is no need > to do this on your own. > > Spark work with partitioned data set. > It is required to provide data partition information to Spark from custom > Data Source(Ignite). > > Can I get information about pruned partitions throw some public API? > Is there a plan or ticket to implement such API? > > > > 2017-11-29 10:34 GMT+03:00 Vladimir Ozerov <voze...@gridgain.com>: > > > Nikolay, > > > > Regarding p3. - partition pruning is already implemented in Ignite, so > > there is no need to do this on your own. > > > > On Wed, Nov 29, 2017 at 3:23 AM, Valentin Kulichenko < > > valentin.kuliche...@gmail.com> wrote: > > > > > Nikolay, > > > > > > Custom strategy allows to fully process the AST generated by Spark and > > > convert it to Ignite SQL, so there will be no execution on Spark side > at > > > all. This is what we are trying to achieve here. Basically, one will be > > > able to use DataFrame API to execute queries directly on Ignite. Does > it > > > make sense to you? > > > > > > I would recommend you to take a look at MemSQL implementation which > does > > > similar stuff: https://github.com/memsql/memsql-spark-connector > > > > > > Note that this approach will work only if all relations included in AST > > are > > > Ignite tables. Otherwise, strategy should return null so that Spark > falls > > > back to its regular mode. Ignite will be used as regular data source in > > > this case, and probably it's possible to implement some optimizations > > here > > > as well. However, I never investigated this and it seems like another > > > separate discussion. > > > > > > -Val > > > > > > On Tue, Nov 28, 2017 at 9:54 AM, Николай Ижиков < > nizhikov....@gmail.com> > > > wrote: > > > > > > > Hello, guys. > > > > > > > > I have implemented basic support of Spark Data Frame API [1], [2] for > > > > Ignite. > > > > Spark provides API for a custom strategy to optimize queries from > spark > > > to > > > > underlying data source(Ignite). > > > > > > > > The goal of optimization(obvious, just to be on the same page): > > > > Minimize data transfer between Spark and Ignite. > > > > Speedup query execution. > > > > > > > > I see 3 ways to optimize queries: > > > > > > > > 1. *Join Reduce* If one make some query that join two or more > > > > Ignite tables, we have to pass all join info to Ignite and transfer > to > > > > Spark only result of table join. > > > > To implement it we have to extend current implementation with > > new > > > > RelationProvider that can generate all kind of joins for two or more > > > tables. > > > > We should add some tests, also. > > > > The question is - how join result should be partitioned? > > > > > > > > > > > > 2. *Order by* If one make some query to Ignite table with > order > > > by > > > > clause we can execute sorting on Ignite side. > > > > But it seems that currently Spark doesn’t have any way to > tell > > > > that partitions already sorted. > > > > > > > > > > > > 3. *Key filter* If one make query with `WHERE key = XXX` or > > > `WHERE > > > > key IN (X, Y, Z)`, we can reduce number of partitions. > > > > And query only partitions that store certain key values. > > > > Is this kind of optimization already built in Ignite or I > > should > > > > implement it by myself? > > > > > > > > May be, there is any other way to make queries run faster? > > > > > > > > [1] https://spark.apache.org/docs/latest/sql-programming-guide.html > > > > [2] https://github.com/apache/ignite/pull/2742 > > > > > > > > > > > > > -- > Nikolay Izhikov > nizhikov....@gmail.com >