Re: Spark DataFrames With Cache Key and Value Objects

Nikolay Izhikov Fri, 27 Jul 2018 06:19:27 -0700

Sure.

Please, send ticket number in this thread.


пт, 27 июля 2018 г., 16:16 Stuart Macdonald <stu...@stuwee.org>:

> Thanks Nikolay. For both options if the cache object isn’t a simple type,
> we’d probably do something like this in our Ignite SQL statement:
>
> select cast(_key as binary), cast(_val as binary), ...
>
> Which would give us the BinaryObject’s byte[], then for option 1 we keep
> the Ignite format and introduce a new Spark Encoder for Ignite binary types
> (
>
> https://spark.apache.org/docs/2.1.0/api/java/org/apache/spark/sql/Encoder.html
> ),
> so that the end user interface would be something like:
>
> IgniteSparkSession session = ...
> Dataset<Row> dataFrame = ...
> Dataset<MyValClass> valDataSet =
> dataFrame.select(“_val_).as(session.binaryObjectEncoder(MyValClass.class))
>
> Or for option 2 we have a behind-the-scenes Ignite-to-Kryo UDF so that the
> user interface would be standard Spark:
>
> Dataset<Row> dataFrame = ...
> DataSet<MyValClass> dataSet =
> dataFrame.select(“_val_).as(Encoders.kryo(MyValClass.class))
>
> I’ll create a ticket and maybe put together a test case for further
> discussion?
>
> Stuart.
>
> On 27 Jul 2018, at 09:50, Nikolay Izhikov <nizhi...@apache.org> wrote:
>
> Hello, Stuart.
>
> I like your idea.
>
> 1. Ignite BinaryObjects, in which case we’d need to supply a Spark Encoder
> implementation for BinaryObjects
>
> 2. Kryo-serialised versions of the objects.
>
>
> Seems like first option is simple adapter. Am I right?
> If yes, I think it's a more efficient way comparing with transformation of
> each object to some other(Kryo) format.
>
> Can you provide some additional links for both options?
> Where I can find API or(and) examples?
>
> As a second step, we can apply same approach to the regular key, value
> caches.
>
> Feel free to create a ticket.
>
> В Пт, 27/07/2018 в 09:37 +0100, Stuart Macdonald пишет:
>
> Ignite Dev Community,
>
>
> Within Ignite-supplied Spark DataFrames, I’d like to propose adding support
>
> for _key and _val columns which represent the cache key and value objects
>
> similar to the current _key/_val column semantics in Ignite SQL.
>
>
> If the cache key or value objects are standard SQL types (eg. String, Int,
>
> etc) they will be represented as such in the DataFrame schema, otherwise
>
> they are represented as Binary types encoded as either: 1. Ignite
>
> BinaryObjects, in which case we’d need to supply a Spark Encoder
>
> implementation for BinaryObjects, or 2. Kryo-serialised versions of the
>
> objects. Option 1 would probably be more efficient but option 2 would be
>
> more idiomatic Spark.
>
>
> This feature would be controlled with an optional parameter in the Ignite
>
> data source, defaulting to the current implementation which doesn’t supply
>
> _key or _val columns. The rationale behind this is the same as the Ignite
>
> SQL _key and _val columns: to allow access to the full cache objects from a
>
> SQL context.
>
>
> Can I ask for feedback on this proposal please?
>
>
> I’d be happy to contribute this feature if we agree on the concept.
>
>
> Stuart.
>

Re: Spark DataFrames With Cache Key and Value Objects

Reply via email to