RE: Code review - Spark SQL command-line client for Cassandra

Matthew Johnson Mon, 22 Jun 2015 02:17:44 -0700

Thanks Mohammed, it’s good to know I’m not alone!



How easy is it to integrate Zeppelin with Spark on Cassandra? It looks like
it would only support Hadoop out of the box. Is it just a case of dropping
the Cassandra Connector onto the Spark classpath?



Cheers,

Matthew



*From:* Mohammed Guller [mailto:[email protected]]
*Sent:* 20 June 2015 17:27
*To:* shahid ashraf
*Cc:* Matthew Johnson; [email protected]
*Subject:* RE: Code review - Spark SQL command-line client for Cassandra



It is a simple Play-based web application. It exposes an URI for submitting
a SQL query. It then executes that query using CassandraSQLContext provided
by Spark Cassandra Connector. Since it is web-based, I added an
authentication and authorization layer to make sure that only users with
the right authorization can use it.



I am happy to open-source that code if there is interest. Just need to
carve out some time to clean it up and remove all the other services that
this web application provides.



Mohammed



*From:* shahid ashraf [mailto:[email protected] <[email protected]>]
*Sent:* Saturday, June 20, 2015 6:52 AM
*To:* Mohammed Guller
*Cc:* Matthew Johnson; [email protected]
*Subject:* RE: Code review - Spark SQL command-line client for Cassandra



Hi Mohammad
Can you provide more info about the Service u developed

On Jun 20, 2015 7:59 AM, "Mohammed Guller" <[email protected]> wrote:

Hi Matthew,

It looks fine to me. I have built a similar service that allows a user to
submit a query from a browser and returns the result in JSON format.



Another alternative is to leave a Spark shell or one of the notebooks
(Spark Notebook, Zeppelin, etc.) session open and run queries from there.
This model works only if people give you the queries to execute.



Mohammed



*From:* Matthew Johnson [mailto:[email protected]]
*Sent:* Friday, June 19, 2015 2:20 AM
*To:* [email protected]
*Subject:* Code review - Spark SQL command-line client for Cassandra



Hi all,



I have been struggling with Cassandra’s lack of adhoc query support (I know
this is an anti-pattern of Cassandra, but sometimes management come over
and ask me to run stuff and it’s impossible to explain that it will take me
a while when it would take about 10 seconds in MySQL) so I have put
together the following code snippet that bundles DataStax’s Cassandra Spark
connector and allows you to submit Spark SQL to it, outputting the results
in a text file.



Does anyone spot any obvious flaws in this plan?? (I have a lot more error
handling etc in my code, but removed it here for brevity)



    *private* *void* run(String sqlQuery) {

        SparkContext scc = *new* SparkContext(conf);

        CassandraSQLContext csql = *new* CassandraSQLContext(scc);

        DataFrame sql = csql.sql(sqlQuery);

        String folderName = "/tmp/output_" + System.*currentTimeMillis*();

        *LOG*.info("Attempting to save SQL results in folder: " +
folderName);

        sql.rdd().saveAsTextFile(folderName);

        *LOG*.info("SQL results saved");

    }



    *public* *static* *void* main(String[] args) {



        String sparkMasterUrl = args[0];

        String sparkHost = args[1];

        String sqlQuery = args[2];



        SparkConf conf = *new* SparkConf();

        conf.setAppName("Java Spark SQL");

        conf.setMaster(sparkMasterUrl);

        conf.set("spark.cassandra.connection.host", sparkHost);



        JavaSparkSQL app = *new* JavaSparkSQL(conf);



        app.run(sqlQuery, printToConsole);

    }



I can then submit this to Spark with ‘spark-submit’:



Ø  *./spark-submit --class com.algomi.spark.JavaSparkSQL --master
spark://sales3:7077
spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar
spark://sales3:7077 sales3 "select * from mykeyspace.operationlog" *



It seems to work pretty well, so I’m pretty happy, but wondering why this
isn’t common practice (at least I haven’t been able to find much about it
on Google) – is there something terrible that I’m missing?



Thanks!

Matthew

RE: Code review - Spark SQL command-line client for Cassandra

Reply via email to