Sure. Please, send ticket number in this thread.
пт, 27 июля 2018 г., 16:16 Stuart Macdonald <stu...@stuwee.org>: > Thanks Nikolay. For both options if the cache object isn’t a simple type, > we’d probably do something like this in our Ignite SQL statement: > > select cast(_key as binary), cast(_val as binary), ... > > Which would give us the BinaryObject’s byte[], then for option 1 we keep > the Ignite format and introduce a new Spark Encoder for Ignite binary types > ( > > https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Encoder.html > ), > so that the end user interface would be something like: > > IgniteSparkSession session = ... > Dataset<Row> dataFrame = ... > Dataset<MyValClass> valDataSet = > dataFrame.select(“_val_).as(session.binaryObjectEncoder(MyValClass.class)) > > Or for option 2 we have a behind-the-scenes Ignite-to-Kryo UDF so that the > user interface would be standard Spark: > > Dataset<Row> dataFrame = ... > DataSet<MyValClass> dataSet = > dataFrame.select(“_val_).as(Encoders.kryo(MyValClass.class)) > > I’ll create a ticket and maybe put together a test case for further > discussion? > > Stuart. > > On 27 Jul 2018, at 09:50, Nikolay Izhikov <nizhi...@apache.org> wrote: > > Hello, Stuart. > > I like your idea. > > 1. Ignite BinaryObjects, in which case we’d need to supply a Spark Encoder > implementation for BinaryObjects > > 2. Kryo-serialised versions of the objects. > > > Seems like first option is simple adapter. Am I right? > If yes, I think it's a more efficient way comparing with transformation of > each object to some other(Kryo) format. > > Can you provide some additional links for both options? > Where I can find API or(and) examples? > > As a second step, we can apply same approach to the regular key, value > caches. > > Feel free to create a ticket. > > В Пт, 27/07/2018 в 09:37 +0100, Stuart Macdonald пишет: > > Ignite Dev Community, > > > Within Ignite-supplied Spark DataFrames, I’d like to propose adding support > > for _key and _val columns which represent the cache key and value objects > > similar to the current _key/_val column semantics in Ignite SQL. > > > If the cache key or value objects are standard SQL types (eg. String, Int, > > etc) they will be represented as such in the DataFrame schema, otherwise > > they are represented as Binary types encoded as either: 1. Ignite > > BinaryObjects, in which case we’d need to supply a Spark Encoder > > implementation for BinaryObjects, or 2. Kryo-serialised versions of the > > objects. Option 1 would probably be more efficient but option 2 would be > > more idiomatic Spark. > > > This feature would be controlled with an optional parameter in the Ignite > > data source, defaulting to the current implementation which doesn’t supply > > _key or _val columns. The rationale behind this is the same as the Ignite > > SQL _key and _val columns: to allow access to the full cache objects from a > > SQL context. > > > Can I ask for feedback on this proposal please? > > > I’d be happy to contribute this feature if we agree on the concept. > > > Stuart. >