Mohammed,
While you're willing to answer questions, is there a trick to getting the
Hive Thrift server to connect to remote Cassandra instances?
0: jdbc:hive2://localhost:10000> SET
spark.cassandra.connection.host="cassandrahost";
SET spark.cassandra.connection.host="cassandrahost";
+-----------------------------------------------------------+
| |
+-----------------------------------------------------------+
| spark.cassandra.connection.host="cassandrahost" |
+-----------------------------------------------------------+
1 row selected (0.018 seconds)
0: jdbc:hive2://localhost:10000> create temporary table cdr using
org.apache.spark.sql.cassandra OPTIONS ( keyspace "c2", table
"detectionresult" );
create temporary table cdr using org.apache.spark.sql.cassandra OPTIONS (
keyspace "c2", table "detectionresult" );
]Error: java.io.IOException: Failed to open native connection to Cassandra
at {10.0.0.4}:9042 (state=,code=0)
This seems to be connecting to local host regardless of the value I set
spark.cassandra.connection.host to.
Regards,
Bryan Jeffrey
On Thu, Nov 12, 2015 at 12:54 PM, Bryan Jeffrey <[email protected]>
wrote:
> Yes, I do - I found your example of doing that later in your slides.
> Thank you for your help!
>
> On Thu, Nov 12, 2015 at 12:20 PM, Mohammed Guller <[email protected]>
> wrote:
>
>> Did you mean Hive or Spark SQL JDBC/ODBC server?
>>
>>
>>
>> Mohammed
>>
>>
>>
>> *From:* Bryan Jeffrey [mailto:[email protected]]
>> *Sent:* Thursday, November 12, 2015 9:12 AM
>> *To:* Mohammed Guller
>> *Cc:* user
>> *Subject:* Re: Cassandra via SparkSQL/Hive JDBC
>>
>>
>>
>> Mohammed,
>>
>>
>>
>> That is great. It looks like a perfect scenario. Would I be able to make
>> the created DF queryable over the Hive JDBC/ODBC server?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Bryan Jeffrey
>>
>>
>>
>> On Wed, Nov 11, 2015 at 9:34 PM, Mohammed Guller <[email protected]>
>> wrote:
>>
>> Short answer: yes.
>>
>>
>>
>> The Spark Cassandra Connector supports the data source API. So you can
>> create a DataFrame that points directly to a Cassandra table. You can query
>> it using the DataFrame API or the SQL/HiveQL interface.
>>
>>
>>
>> If you want to see an example, see slide# 27 and 28 in this deck that I
>> presented at the Cassandra Summit 2015:
>>
>> http://www.slideshare.net/mg007/ad-hoc-analytics-with-cassandra-and-spark
>>
>>
>>
>>
>>
>> Mohammed
>>
>>
>>
>> *From:* Bryan [mailto:[email protected]]
>> *Sent:* Tuesday, November 10, 2015 7:42 PM
>> *To:* Bryan Jeffrey; user
>> *Subject:* RE: Cassandra via SparkSQL/Hive JDBC
>>
>>
>>
>> Anyone have thoughts or a similar use-case for SparkSQL / Cassandra?
>>
>> Regards,
>>
>> Bryan Jeffrey
>> ------------------------------
>>
>> *From: *Bryan Jeffrey <[email protected]>
>> *Sent: *11/4/2015 11:16 AM
>> *To: *user <[email protected]>
>> *Subject: *Cassandra via SparkSQL/Hive JDBC
>>
>> Hello.
>>
>>
>>
>> I have been working to add SparkSQL HDFS support to our application.
>> We're able to process streaming data, append to a persistent Hive table,
>> and have that table available via JDBC/ODBC. Now we're looking to access
>> data in Cassandra via SparkSQL.
>>
>>
>>
>> In reading a number of previous posts, it appears that the way to do this
>> is to instantiate a Spark Context, read the data into an RDD using the
>> Cassandra Spark Connector, convert the data to a DF and register it as a
>> temporary table. The data will then be accessible via SparkSQL - although
>> I assume that you would need to refresh the table on a periodic basis.
>>
>>
>>
>> Is there a more straightforward way to do this? Is it possible to
>> register the Cassandra table with Hive so that the SparkSQL thrift server
>> instance can just read data directly?
>>
>>
>>
>> Regards,
>>
>>
>>
>> Bryan Jeffrey
>>
>>
>>
>
>