[ 
https://issues.apache.org/jira/browse/IGNITE-9108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexey Zinoviev updated IGNITE-9108:
------------------------------------
    Fix Version/s: 2.9

> Spark DataFrames With Cache Key and Value Objects
> -------------------------------------------------
>
>                 Key: IGNITE-9108
>                 URL: https://issues.apache.org/jira/browse/IGNITE-9108
>             Project: Ignite
>          Issue Type: New Feature
>          Components: spark
>    Affects Versions: 2.9
>            Reporter: Stuart Macdonald
>            Assignee: Alexey Zinoviev
>            Priority: Major
>             Fix For: 2.9
>
>
> Add support for _key and _val columns within Ignite-provided Spark 
> DataFrames, which represent the cache key and value objects similar to the 
> current _key/_val column semantics in Ignite SQL.
>  
> If the cache key or value objects are standard SQL types (eg. String, Int, 
> etc) they will be represented as such in the DataFrame schema, otherwise they 
> are represented as Binary types encoded as either: 1. Ignite BinaryObjects, 
> in which case we'd need to supply a Spark Encoder implementation for 
> BinaryObjects, eg:
>  
> {code:java}
> IgniteSparkSession session = ...
> Dataset<Row> dataFrame = ...
> Dataset<MyValClass> valDataSet = 
> dataFrame.select("_val_).as(session.binaryObjectEncoder(MyValClass.class))
> {code}
> Or 2. Kryo-serialised versions of the objects, eg:
>  
> {code:java}
> Dataset<Row> dataFrame = ...
> DataSet<MyValClass> dataSet = 
> dataFrame.select("_val_).as(Encoders.kryo(MyValClass.class))
> {code}
> Option 1 would probably be more efficient but option 2 would be more 
> idiomatic Spark.
>  
> The rationale behind this is the same as the Ignite SQL _key and _val 
> columns: to allow access to the full cache objects from a SQL context.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to