I believe DataStax is working on better integration here, but until that is ready you can use the applySchema API. Basically you will convert the CassandraTable into and RDD of Row objects using a .map() and then you can call applySchema (provided by SQLContext) to get a SchemaRDD.
More details will be available in the SQL Programming Guide for 1.1 (which will hopefully be published in the next day or two). You can see the raw version here: https://raw.githubusercontent.com/apache/spark/master/docs/sql-programming-guide.md Look for section: Programmatically Specifying the Schema On Mon, Sep 8, 2014 at 7:22 AM, gtinside <gtins...@gmail.com> wrote: > Hi , > > I am reading data from Cassandra through datastax spark-cassandra connector > converting it into JSON and then running spark-sql on it. Refer to the code > snippet below : > > step 1 >>>>> val o_rdd = sc.cassandraTable[CassandraRDDWrapper]( > '<keyspace>', '<column_family>') > step 2 >>>>> val tempObjectRDD = sc.parallelize(o_rdd.toArray.map(i=>i), > 100) > step 3 >>>>> val objectRDD = sqlContext.jsonRDD(tempObjectRDD) > step 4 >>>>> objectRDD .registerAsTable("objects") > > At step (2) I have to explicitly do a "toArray" because jsonRDD takes in a > RDD[String]. For me calling "toArray" on cassandra rdd takes forever as > have > million records in cassandra . Is there a better way of doing this ? How > can > I optimize it ? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-on-Cassandra-tp13696.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >