Hi Bryan, Yes, you can query a real Cassandra cluster. You just need to provide the address of the Cassandra seed node.
Looks like you figured out the answer. You can also put the C* seed node address in the spark-defaults.conf file under the SPARK_HOME/conf directory. Then you don’t need to manually SET it for each Beeline session. Mohammed From: Bryan Jeffrey [mailto:bryan.jeff...@gmail.com] Sent: Thursday, November 12, 2015 10:26 AM To: Mohammed Guller Cc: user Subject: Re: Cassandra via SparkSQL/Hive JDBC Answer: In beeline run the following: SET spark.cassandra.connection.host="10.0.0.10" On Thu, Nov 12, 2015 at 1:13 PM, Bryan Jeffrey <bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>> wrote: Mohammed, While you're willing to answer questions, is there a trick to getting the Hive Thrift server to connect to remote Cassandra instances? 0: jdbc:hive2://localhost:10000> SET spark.cassandra.connection.host="cassandrahost"; SET spark.cassandra.connection.host="cassandrahost"; +-----------------------------------------------------------+ | | +-----------------------------------------------------------+ | spark.cassandra.connection.host="cassandrahost" | +-----------------------------------------------------------+ 1 row selected (0.018 seconds) 0: jdbc:hive2://localhost:10000> create temporary table cdr using org.apache.spark.sql.cassandra OPTIONS ( keyspace "c2", table "detectionresult" ); create temporary table cdr using org.apache.spark.sql.cassandra OPTIONS ( keyspace "c2", table "detectionresult" ); ]Error: java.io.IOException: Failed to open native connection to Cassandra at {10.0.0.4}:9042 (state=,code=0) This seems to be connecting to local host regardless of the value I set spark.cassandra.connection.host to. Regards, Bryan Jeffrey On Thu, Nov 12, 2015 at 12:54 PM, Bryan Jeffrey <bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>> wrote: Yes, I do - I found your example of doing that later in your slides. Thank you for your help! On Thu, Nov 12, 2015 at 12:20 PM, Mohammed Guller <moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote: Did you mean Hive or Spark SQL JDBC/ODBC server? Mohammed From: Bryan Jeffrey [mailto:bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>] Sent: Thursday, November 12, 2015 9:12 AM To: Mohammed Guller Cc: user Subject: Re: Cassandra via SparkSQL/Hive JDBC Mohammed, That is great. It looks like a perfect scenario. Would I be able to make the created DF queryable over the Hive JDBC/ODBC server? Regards, Bryan Jeffrey On Wed, Nov 11, 2015 at 9:34 PM, Mohammed Guller <moham...@glassbeam.com<mailto:moham...@glassbeam.com>> wrote: Short answer: yes. The Spark Cassandra Connector supports the data source API. So you can create a DataFrame that points directly to a Cassandra table. You can query it using the DataFrame API or the SQL/HiveQL interface. If you want to see an example, see slide# 27 and 28 in this deck that I presented at the Cassandra Summit 2015: http://www.slideshare.net/mg007/ad-hoc-analytics-with-cassandra-and-spark Mohammed From: Bryan [mailto:bryan.jeff...@gmail.com<mailto:bryan.jeff...@gmail.com>] Sent: Tuesday, November 10, 2015 7:42 PM To: Bryan Jeffrey; user Subject: RE: Cassandra via SparkSQL/Hive JDBC Anyone have thoughts or a similar use-case for SparkSQL / Cassandra? Regards, Bryan Jeffrey ________________________________ From: Bryan Jeffrey<mailto:bryan.jeff...@gmail.com> Sent: 11/4/2015 11:16 AM To: user<mailto:user@spark.apache.org> Subject: Cassandra via SparkSQL/Hive JDBC Hello. I have been working to add SparkSQL HDFS support to our application. We're able to process streaming data, append to a persistent Hive table, and have that table available via JDBC/ODBC. Now we're looking to access data in Cassandra via SparkSQL. In reading a number of previous posts, it appears that the way to do this is to instantiate a Spark Context, read the data into an RDD using the Cassandra Spark Connector, convert the data to a DF and register it as a temporary table. The data will then be accessible via SparkSQL - although I assume that you would need to refresh the table on a periodic basis. Is there a more straightforward way to do this? Is it possible to register the Cassandra table with Hive so that the SparkSQL thrift server instance can just read data directly? Regards, Bryan Jeffrey