I hesitate to ask further questions, but your assistance is advancing my work much faster than extensive fiddling might. I am seeing the following error when querying:
0: jdbc:hive2://localhost:10000> create temporary table cassandraeventcounts using org.apache.spark.sql.cassandra OPTIONS ( keyspace "c2", table "eventcounts" ); Error: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.cassandra.DataTypeConverter$ (state=,code=0) I started the Thrift server as follows: root@sparkdev1:~# /spark/spark-1.4.1/sbin/start-thriftserver.sh --master spark://10.0.0.4:7077 --packages com.datastax.spark:spark-cassandra-connector_2.11:1.5.0-M1 --hiveconf "spark.cores.max=2" --hiveconf "spark.executor.memory=2g" Do I perhaps need to include an additional library to do the default conversion? Regards, Bryan Jeffrey On Thu, Nov 12, 2015 at 1:57 PM, Mohammed Guller <moham...@glassbeam.com> wrote: > Hi Bryan, > > > > Yes, you can query a real Cassandra cluster. You just need to provide the > address of the Cassandra seed node. > > > > Looks like you figured out the answer. You can also put the C* seed node > address in the spark-defaults.conf file under the SPARK_HOME/conf > directory. Then you don’t need to manually SET it for each Beeline session. > > > > Mohammed > > > > *From:* Bryan Jeffrey [mailto:bryan.jeff...@gmail.com] > *Sent:* Thursday, November 12, 2015 10:26 AM > > *To:* Mohammed Guller > *Cc:* user > *Subject:* Re: Cassandra via SparkSQL/Hive JDBC > > > > Answer: In beeline run the following: SET > spark.cassandra.connection.host="10.0.0.10" > > > > On Thu, Nov 12, 2015 at 1:13 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> > wrote: > > Mohammed, > > > > While you're willing to answer questions, is there a trick to getting the > Hive Thrift server to connect to remote Cassandra instances? > > > > 0: jdbc:hive2://localhost:10000> SET > spark.cassandra.connection.host="cassandrahost"; > SET spark.cassandra.connection.host="cassandrahost"; > +-----------------------------------------------------------+ > | | > +-----------------------------------------------------------+ > | spark.cassandra.connection.host="cassandrahost" | > +-----------------------------------------------------------+ > 1 row selected (0.018 seconds) > 0: jdbc:hive2://localhost:10000> create temporary table cdr using > org.apache.spark.sql.cassandra OPTIONS ( keyspace "c2", table > "detectionresult" ); > create temporary table cdr using org.apache.spark.sql.cassandra OPTIONS ( > keyspace "c2", table "detectionresult" ); > ]Error: java.io.IOException: Failed to open native connection to Cassandra > at {10.0.0.4}:9042 (state=,code=0) > > > > This seems to be connecting to local host regardless of the value I set > spark.cassandra.connection.host to. > > > > Regards, > > > > Bryan Jeffrey > > > > On Thu, Nov 12, 2015 at 12:54 PM, Bryan Jeffrey <bryan.jeff...@gmail.com> > wrote: > > Yes, I do - I found your example of doing that later in your slides. > Thank you for your help! > > > > On Thu, Nov 12, 2015 at 12:20 PM, Mohammed Guller <moham...@glassbeam.com> > wrote: > > Did you mean Hive or Spark SQL JDBC/ODBC server? > > > > Mohammed > > > > *From:* Bryan Jeffrey [mailto:bryan.jeff...@gmail.com] > *Sent:* Thursday, November 12, 2015 9:12 AM > *To:* Mohammed Guller > *Cc:* user > *Subject:* Re: Cassandra via SparkSQL/Hive JDBC > > > > Mohammed, > > > > That is great. It looks like a perfect scenario. Would I be able to make > the created DF queryable over the Hive JDBC/ODBC server? > > > > Regards, > > > > Bryan Jeffrey > > > > On Wed, Nov 11, 2015 at 9:34 PM, Mohammed Guller <moham...@glassbeam.com> > wrote: > > Short answer: yes. > > > > The Spark Cassandra Connector supports the data source API. So you can > create a DataFrame that points directly to a Cassandra table. You can query > it using the DataFrame API or the SQL/HiveQL interface. > > > > If you want to see an example, see slide# 27 and 28 in this deck that I > presented at the Cassandra Summit 2015: > > http://www.slideshare.net/mg007/ad-hoc-analytics-with-cassandra-and-spark > > > > > > Mohammed > > > > *From:* Bryan [mailto:bryan.jeff...@gmail.com] > *Sent:* Tuesday, November 10, 2015 7:42 PM > *To:* Bryan Jeffrey; user > *Subject:* RE: Cassandra via SparkSQL/Hive JDBC > > > > Anyone have thoughts or a similar use-case for SparkSQL / Cassandra? > > Regards, > > Bryan Jeffrey > ------------------------------ > > *From: *Bryan Jeffrey <bryan.jeff...@gmail.com> > *Sent: *11/4/2015 11:16 AM > *To: *user <user@spark.apache.org> > *Subject: *Cassandra via SparkSQL/Hive JDBC > > Hello. > > > > I have been working to add SparkSQL HDFS support to our application. > We're able to process streaming data, append to a persistent Hive table, > and have that table available via JDBC/ODBC. Now we're looking to access > data in Cassandra via SparkSQL. > > > > In reading a number of previous posts, it appears that the way to do this > is to instantiate a Spark Context, read the data into an RDD using the > Cassandra Spark Connector, convert the data to a DF and register it as a > temporary table. The data will then be accessible via SparkSQL - although > I assume that you would need to refresh the table on a periodic basis. > > > > Is there a more straightforward way to do this? Is it possible to > register the Cassandra table with Hive so that the SparkSQL thrift server > instance can just read data directly? > > > > Regards, > > > > Bryan Jeffrey > > > > > > > > >