Yes, it is very simple to access Cassandra data using Spark shell.
Step 1: Launch the spark-shell with the spark-cassandra-connector package
$SPARK_HOME/bin/spark-shell --packages
com.datastax.spark:spark-cassandra-connector_2.10:1.5.0
Step 2: Create a DataFrame pointing to your Cassandra table
val dfCassTable = sqlContext.read
.format("org.apache.spark.sql.cassandra")
.options(Map( "table"
-> "your_column_family", "keyspace" -> "your_keyspace"))
.load()
From this point onward, you have complete access to the DataFrame API. You can
even register it as a temporary table, if you would prefer to use SQL/HiveQL.
Mohammed
Author: Big Data Analytics with
Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
From: Ben Slater [mailto:[email protected]]
Sent: Monday, May 9, 2016 9:28 PM
To: [email protected]; user
Subject: Re: Accessing Cassandra data from Spark Shell
You can use SparkShell to access Cassandra via the Spark Cassandra connector.
The getting started article on our support page will probably give you a good
steer to get started even if you’re not using Instaclustr:
https://support.instaclustr.com/hc/en-us/articles/213097877-Getting-Started-with-Instaclustr-Spark-Cassandra-
Cheers
Ben
On Tue, 10 May 2016 at 14:08 Cassa L
<[email protected]<mailto:[email protected]>> wrote:
Hi,
Has anyone tried accessing Cassandra data using SparkShell? How do you do it?
Can you use HiveContext for Cassandra data? I'm using community version of
Cassandra-3.0
Thanks,
LCassa
--
————————
Ben Slater
Chief Product Officer, Instaclustr
+61 437 929 798