[DISCUSSION] Spark Data Frame through Thin Client

Nikolay Izhikov Sat, 20 Oct 2018 14:09:56 -0700

Hello, Igniters.

Currently, Spark Data Frame integration implemented via client node connection.
Whenever we need to retrieve some data into Spark worker(or master) from Ignite 
we start a client node.


It has several major disadvantages:

        1. We should copy whole Ignite distribution on to each Spark worker [1]
        2. We should copy whole Ignite distribution on to Spark master to get 
catalogue works.
        3. We should have the same absolute path to Ignite configuration file 
on every worker and provide it during data frame construction [2]
        4. We should additionally configure Spark workerks classpath to include 
Ignite libraries.

For now, almost all operation we need to do in Spark Data Frame integration is 
supported by Java Thin Client.
        * obtain the list of caches.
        * get cache configuration.
        * execute SQL query.
        * stream data to the table - don't support by the thin client for now, 
but can be implemented using simple SQL INSERT statements.

Advantages of usage Java Thin Client in Spark integration(they all known from 
Java Thin Client advantages):
        1. Easy to configure: only IP addresses of server nodes are required.
        2. Easy to deploy: only 1 additional jar required. No server 
side(Ignite worker) configuration required.

I propose to implement Spark Data Frame integration through Java Thin Client.

Thoughts?

[1] https://apacheignite-fs.readme.io/docs/installation-deployment
[2] 
https://apacheignite-fs.readme.io/docs/ignite-data-frame#section-ignite-dataframe-options

signature.asc
Description: This is a digitally signed message part

[DISCUSSION] Spark Data Frame through Thin Client

Reply via email to