Re: Cassandra via SparkSQL/Hive JDBC

Bryan Jeffrey Thu, 12 Nov 2015 11:20:07 -0800

I hesitate to ask further questions, but your assistance is advancing my
work much faster than extensive fiddling might.  I am seeing the following
error when querying:


0: jdbc:hive2://localhost:10000> create temporary table
cassandraeventcounts using org.apache.spark.sql.cassandra OPTIONS (
keyspace "c2", table "eventcounts" );
Error: java.lang.NoClassDefFoundError: Could not initialize class
org.apache.spark.sql.cassandra.DataTypeConverter$ (state=,code=0)

I started the Thrift server as follows:

root@sparkdev1:~# /spark/spark-1.4.1/sbin/start-thriftserver.sh --master
spark://10.0.0.4:7077 --packages
com.datastax.spark:spark-cassandra-connector_2.11:1.5.0-M1 --hiveconf
"spark.cores.max=2" --hiveconf "spark.executor.memory=2g"

Do I perhaps need to include an additional library to do the default
conversion?

Regards,

Bryan Jeffrey


On Thu, Nov 12, 2015 at 1:57 PM, Mohammed Guller <moham...@glassbeam.com>
wrote:

> Hi Bryan,
>
>
>
> Yes, you can query a real Cassandra cluster. You just need to provide the
> address of the Cassandra seed node.
>
>
>
> Looks like you figured out the answer. You can also put the C* seed node
> address in the spark-defaults.conf file under the SPARK_HOME/conf
> directory. Then you don’t need to manually SET it for each Beeline session.
>
>
>
> Mohammed
>
>
>
> *From:* Bryan Jeffrey [mailto:bryan.jeff...@gmail.com]
> *Sent:* Thursday, November 12, 2015 10:26 AM
>
> *To:* Mohammed Guller
> *Cc:* user
> *Subject:* Re: Cassandra via SparkSQL/Hive JDBC
>
>
>
> Answer: In beeline run the following: SET
> spark.cassandra.connection.host="10.0.0.10"
>
>
>
> On Thu, Nov 12, 2015 at 1:13 PM, Bryan Jeffrey <bryan.jeff...@gmail.com>
> wrote:
>
> Mohammed,
>
>
>
> While you're willing to answer questions, is there a trick to getting the
> Hive Thrift server to connect to remote Cassandra instances?
>
>
>
> 0: jdbc:hive2://localhost:10000> SET
> spark.cassandra.connection.host="cassandrahost";
> SET spark.cassandra.connection.host="cassandrahost";
> +-----------------------------------------------------------+
> |                                                           |
> +-----------------------------------------------------------+
> | spark.cassandra.connection.host="cassandrahost"  |
> +-----------------------------------------------------------+
> 1 row selected (0.018 seconds)
> 0: jdbc:hive2://localhost:10000> create temporary table cdr using
> org.apache.spark.sql.cassandra OPTIONS ( keyspace "c2", table
> "detectionresult" );
> create temporary table cdr using org.apache.spark.sql.cassandra OPTIONS (
> keyspace "c2", table "detectionresult" );
> ]Error: java.io.IOException: Failed to open native connection to Cassandra
> at {10.0.0.4}:9042 (state=,code=0)
>
>
>
> This seems to be connecting to local host regardless of the value I set
> spark.cassandra.connection.host to.
>
>
>
> Regards,
>
>
>
> Bryan Jeffrey
>
>
>
> On Thu, Nov 12, 2015 at 12:54 PM, Bryan Jeffrey <bryan.jeff...@gmail.com>
> wrote:
>
> Yes, I do - I found your example of doing that later in your slides.
> Thank you for your help!
>
>
>
> On Thu, Nov 12, 2015 at 12:20 PM, Mohammed Guller <moham...@glassbeam.com>
> wrote:
>
> Did you mean Hive or Spark SQL JDBC/ODBC server?
>
>
>
> Mohammed
>
>
>
> *From:* Bryan Jeffrey [mailto:bryan.jeff...@gmail.com]
> *Sent:* Thursday, November 12, 2015 9:12 AM
> *To:* Mohammed Guller
> *Cc:* user
> *Subject:* Re: Cassandra via SparkSQL/Hive JDBC
>
>
>
> Mohammed,
>
>
>
> That is great.  It looks like a perfect scenario. Would I be able to make
> the created DF queryable over the Hive JDBC/ODBC server?
>
>
>
> Regards,
>
>
>
> Bryan Jeffrey
>
>
>
> On Wed, Nov 11, 2015 at 9:34 PM, Mohammed Guller <moham...@glassbeam.com>
> wrote:
>
> Short answer: yes.
>
>
>
> The Spark Cassandra Connector supports the data source API. So you can
> create a DataFrame that points directly to a Cassandra table. You can query
> it using the DataFrame API or the SQL/HiveQL interface.
>
>
>
> If you want to see an example,  see slide# 27 and 28 in this deck that I
> presented at the Cassandra Summit 2015:
>
> http://www.slideshare.net/mg007/ad-hoc-analytics-with-cassandra-and-spark
>
>
>
>
>
> Mohammed
>
>
>
> *From:* Bryan [mailto:bryan.jeff...@gmail.com]
> *Sent:* Tuesday, November 10, 2015 7:42 PM
> *To:* Bryan Jeffrey; user
> *Subject:* RE: Cassandra via SparkSQL/Hive JDBC
>
>
>
> Anyone have thoughts or a similar use-case for SparkSQL / Cassandra?
>
> Regards,
>
> Bryan Jeffrey
> ------------------------------
>
> *From: *Bryan Jeffrey <bryan.jeff...@gmail.com>
> *Sent: *‎11/‎4/‎2015 11:16 AM
> *To: *user <user@spark.apache.org>
> *Subject: *Cassandra via SparkSQL/Hive JDBC
>
> Hello.
>
>
>
> I have been working to add SparkSQL HDFS support to our application.
> We're able to process streaming data, append to a persistent Hive table,
> and have that table available via JDBC/ODBC.  Now we're looking to access
> data in Cassandra via SparkSQL.
>
>
>
> In reading a number of previous posts, it appears that the way to do this
> is to instantiate a Spark Context, read the data into an RDD using the
> Cassandra Spark Connector, convert the data to a DF and register it as a
> temporary table.  The data will then be accessible via SparkSQL - although
> I assume that you would need to refresh the table on a periodic basis.
>
>
>
> Is there a more straightforward way to do this?  Is it possible to
> register the Cassandra table with Hive so that the SparkSQL thrift server
> instance can just read data directly?
>
>
>
> Regards,
>
>
>
> Bryan Jeffrey
>
>
>
>
>
>
>
>
>

Re: Cassandra via SparkSQL/Hive JDBC

Reply via email to