Hello, Igniters. Valentin,
> We never recommend to use these fields Actually, we did: * Documentation [1]. Please, see "Predefined Fields" section. * Java Example [2] * DotNet Example [3] * Scala Example [4] > ...hopefully will be removed altogether one day This is new for me. Do we have specific plans for it? [1] https://apacheignite-sql.readme.io/docs/schema-and-indexes [2] https://github.com/apache/ignite/blob/master/examples/src/main/java/org/apache/ignite/examples/sql/SqlDmlExample.java#L88 [3] https://github.com/apache/ignite/blob/master/modules/platforms/dotnet/examples/Apache.Ignite.Examples/Sql/SqlDmlExample.cs#L91 [4] https://github.com/apache/ignite/blob/master/examples/src/main/scala/org/apache/ignite/scalar/examples/ScalarCachePopularNumbersExample.scala#L124 В Пт, 27/07/2018 в 15:22 -0700, Valentin Kulichenko пишет: > Stuart, > > _key and _val fields is quite a dirty hack that was added years ago and is > virtually never used now. We never recommend to use these fields and I > would definitely avoid building new features based on them. > > Having said that, I'm not arguing the use case, but we need better > implementation approach here. I suggest we think it over and come back to > this next week :) I'm sure Nikolay will also chime in and share his > thoughts. > > -Val > > On Fri, Jul 27, 2018 at 12:39 PM Stuart Macdonald <stu...@stuwee.org> wrote: > > > If your predicates and joins are expressed in Spark SQL, you cannot > > currently optimise those and also gain access to the key/val objects. If > > you went without the Ignite Spark SQL optimisations and expressed your > > query in Ignite SQL, you still need to use the _key/_val columns. The > > Ignite documentation has this specific example using the _val column (right > > at the end): > > https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd > > > > Stuart. > > > > On 27 Jul 2018, at 20:05, Valentin Kulichenko < > > valentin.kuliche...@gmail.com> > > wrote: > > > > Well, the second approach would use the optimizations, no? > > > > -Val > > > > > > On Fri, Jul 27, 2018 at 11:49 AM Stuart Macdonald <stu...@stuwee.org> > > wrote: > > > > Val, > > > > > > Yes you can already get access to the cache objects as an RDD or > > > > Dataset but you can’t use the Ignite-optimised DataFrames with these > > > > mechanisms. Optimised DataFrames have to be passed through Spark SQL’s > > > > Catalyst engine to allow for predicate pushdown to Ignite. So the > > > > usecase we’re talking about here is when we want to be able to push > > > > Spark filters/joins to Ignite to optimise, but still have access to > > > > the underlying cache objects, which is not possible currently. > > > > > > Can you elaborate on the reason _key and _val columns in Ignite SQL > > > > will be removed? > > > > > > Stuart. > > > > > > On 27 Jul 2018, at 19:39, Valentin Kulichenko < > > > > valentin.kuliche...@gmail.com> wrote: > > > > > > Stuart, Nikolay, > > > > > > I really don't like the idea of exposing '_key' and '_val' fields. This > > > > is > > > > legacy stuff that hopefully will be removed altogether one day. Let's not > > > > use it in new features. > > > > > > Actually, I don't even think it's even needed. Spark docs [1] suggest two > > > > ways of creating a typed dataset: > > > > 1. Based on RDD. This should be supported using IgniteRDD I believe. > > > > 2. Based on DataFrame providing a class. This would just work out of the > > > > box I guess. > > > > > > Of course, this needs to be tested and verified, and there might be > > > > certain > > > > pieces missing to fully support the use case. But generally I like these > > > > approaches much more. > > > > > > > > > > https://spark.apache.org/docs/2.3.1/sql-programming-guide.html#creating-datasets > > > > > > -Val > > > > > > On Fri, Jul 27, 2018 at 6:31 AM Stuart Macdonald <stu...@stuwee.org> > > > > wrote: > > > > > > Here’s the ticket: > > > > > > https://issues.apache.org/jira/browse/IGNITE-9108 > > > > > > Stuart. > > > > > > > > On Friday, 27 July 2018 at 14:19, Nikolay Izhikov wrote: > > > > > > Sure. > > > > > > Please, send ticket number in this thread. > > > > > > пт, 27 июля 2018 г., 16:16 Stuart Macdonald <stu...@stuwee.org > > > > (mailto: > > > > stu...@stuwee.org)>: > > > > > > Thanks Nikolay. For both options if the cache object isn’t a simple > > > > type, > > > > we’d probably do something like this in our Ignite SQL statement: > > > > > > select cast(_key as binary), cast(_val as binary), ... > > > > > > Which would give us the BinaryObject’s byte[], then for option 1 we > > > > keep > > > > the Ignite format and introduce a new Spark Encoder for Ignite binary > > > > types > > > > ( > > > > > > > > > > > > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Encoder.html > > > > ), > > > > so that the end user interface would be something like: > > > > > > IgniteSparkSession session = ... > > > > Dataset<Row> dataFrame = ... > > > > Dataset<MyValClass> valDataSet = > > > > > > > > dataFrame.select(“_val_).as(session.binaryObjectEncoder(MyValClass.class)) > > > > > > Or for option 2 we have a behind-the-scenes Ignite-to-Kryo UDF so that > > > > the > > > > user interface would be standard Spark: > > > > > > Dataset<Row> dataFrame = ... > > > > DataSet<MyValClass> dataSet = > > > > dataFrame.select(“_val_).as(Encoders.kryo(MyValClass.class)) > > > > > > I’ll create a ticket and maybe put together a test case for further > > > > discussion? > > > > > > Stuart. > > > > > > On 27 Jul 2018, at 09:50, Nikolay Izhikov <nizhi...@apache.org > > > > (mailto:nizhi...@apache.org <nizhi...@apache.org>)> wrote: > > > > > > Hello, Stuart. > > > > > > I like your idea. > > > > > > 1. Ignite BinaryObjects, in which case we’d need to supply a Spark > > > > Encoder > > > > implementation for BinaryObjects > > > > > > 2. Kryo-serialised versions of the objects. > > > > > > > > Seems like first option is simple adapter. Am I right? > > > > If yes, I think it's a more efficient way comparing with > > > > transformation of > > > > each object to some other(Kryo) format. > > > > > > Can you provide some additional links for both options? > > > > Where I can find API or(and) examples? > > > > > > As a second step, we can apply same approach to the regular key, value > > > > caches. > > > > > > Feel free to create a ticket. > > > > > > В Пт, 27/07/2018 в 09:37 +0100, Stuart Macdonald пишет: > > > > > > Ignite Dev Community, > > > > > > > > Within Ignite-supplied Spark DataFrames, I’d like to propose adding > > > > support > > > > > > for _key and _val columns which represent the cache key and value > > > > objects > > > > > > similar to the current _key/_val column semantics in Ignite SQL. > > > > > > > > If the cache key or value objects are standard SQL types (eg. String, > > > > Int, > > > > > > etc) they will be represented as such in the DataFrame schema, > > > > otherwise > > > > > > they are represented as Binary types encoded as either: 1. Ignite > > > > > > BinaryObjects, in which case we’d need to supply a Spark Encoder > > > > > > implementation for BinaryObjects, or 2. Kryo-serialised versions of > > > > the > > > > > > objects. Option 1 would probably be more efficient but option 2 would > > > > be > > > > > > more idiomatic Spark. > > > > > > > > This feature would be controlled with an optional parameter in the > > > > Ignite > > > > > > data source, defaulting to the current implementation which doesn’t > > > > supply > > > > > > _key or _val columns. The rationale behind this is the same as the > > > > Ignite > > > > > > SQL _key and _val columns: to allow access to the full cache objects > > > > from a > > > > > > SQL context. > > > > > > > > Can I ask for feedback on this proposal please? > > > > > > > > I’d be happy to contribute this feature if we agree on the concept. > > > > > > > > Stuart. > >
signature.asc
Description: This is a digitally signed message part