Hello.

I have been working to add SparkSQL HDFS support to our application.  We're
able to process streaming data, append to a persistent Hive table, and have
that table available via JDBC/ODBC.  Now we're looking to access data in
Cassandra via SparkSQL.

In reading a number of previous posts, it appears that the way to do this
is to instantiate a Spark Context, read the data into an RDD using the
Cassandra Spark Connector, convert the data to a DF and register it as a
temporary table.  The data will then be accessible via SparkSQL - although
I assume that you would need to refresh the table on a periodic basis.

Is there a more straightforward way to do this?  Is it possible to register
the Cassandra table with Hive so that the SparkSQL thrift server instance
can just read data directly?

Regards,

Bryan Jeffrey

Reply via email to