RE: Code review - Spark SQL command-line client for Cassandra

shahid ashraf Sat, 20 Jun 2015 06:51:53 -0700

Hi Mohammad
Can you provide more info about the Service u developed
On Jun 20, 2015 7:59 AM, "Mohammed Guller" <moham...@glassbeam.com> wrote:


>  Hi Matthew,
>
> It looks fine to me. I have built a similar service that allows a user to
> submit a query from a browser and returns the result in JSON format.
>
>
>
> Another alternative is to leave a Spark shell or one of the notebooks
> (Spark Notebook, Zeppelin, etc.) session open and run queries from there.
> This model works only if people give you the queries to execute.
>
>
>
> Mohammed
>
>
>
> *From:* Matthew Johnson [mailto:matt.john...@algomi.com]
> *Sent:* Friday, June 19, 2015 2:20 AM
> *To:* user@spark.apache.org
> *Subject:* Code review - Spark SQL command-line client for Cassandra
>
>
>
> Hi all,
>
>
>
> I have been struggling with Cassandra’s lack of adhoc query support (I
> know this is an anti-pattern of Cassandra, but sometimes management come
> over and ask me to run stuff and it’s impossible to explain that it will
> take me a while when it would take about 10 seconds in MySQL) so I have put
> together the following code snippet that bundles DataStax’s Cassandra Spark
> connector and allows you to submit Spark SQL to it, outputting the results
> in a text file.
>
>
>
> Does anyone spot any obvious flaws in this plan?? (I have a lot more error
> handling etc in my code, but removed it here for brevity)
>
>
>
>     *private* *void* run(String sqlQuery) {
>
>         SparkContext scc = *new* SparkContext(conf);
>
>         CassandraSQLContext csql = *new* CassandraSQLContext(scc);
>
>         DataFrame sql = csql.sql(sqlQuery);
>
>         String folderName = "/tmp/output_" + System.*currentTimeMillis*();
>
>         *LOG*.info("Attempting to save SQL results in folder: " +
> folderName);
>
>         sql.rdd().saveAsTextFile(folderName);
>
>         *LOG*.info("SQL results saved");
>
>     }
>
>
>
>     *public* *static* *void* main(String[] args) {
>
>
>
>         String sparkMasterUrl = args[0];
>
>         String sparkHost = args[1];
>
>         String sqlQuery = args[2];
>
>
>
>         SparkConf conf = *new* SparkConf();
>
>         conf.setAppName("Java Spark SQL");
>
>         conf.setMaster(sparkMasterUrl);
>
>         conf.set("spark.cassandra.connection.host", sparkHost);
>
>
>
>         JavaSparkSQL app = *new* JavaSparkSQL(conf);
>
>
>
>         app.run(sqlQuery, printToConsole);
>
>     }
>
>
>
> I can then submit this to Spark with ‘spark-submit’:
>
>
>
> Ø  *./spark-submit --class com.algomi.spark.JavaSparkSQL --master
> spark://sales3:7077
> spark-on-cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar
> spark://sales3:7077 sales3 "select * from mykeyspace.operationlog" *
>
>
>
> It seems to work pretty well, so I’m pretty happy, but wondering why this
> isn’t common practice (at least I haven’t been able to find much about it
> on Google) – is there something terrible that I’m missing?
>
>
>
> Thanks!
>
> Matthew
>
>
>
>
>

RE: Code review - Spark SQL command-line client for Cassandra

Reply via email to