Hi, You can convert standard RDD of Product class (e.g. case class) to SchemaRDD by SQLContext. Load data from Cassandra into RDD of case class, convert it to SchemaRDD and register it, then you can use it in your SQLs.
http://spark.apache.org/docs/latest/sql-programming-guide.html#running-sql-on-rdds Thanks. 2014-07-04 17:59 GMT+09:00 Martin Gammelsæter <martingammelsae...@gmail.com> : > Hi! > > I have a Spark cluster running on top of a Cassandra cluster, using > Datastax' new driver, and one of the fields of my RDDs is an > XML-string. In a normal Scala sparkjob, parsing that data is no > problem, but I would like to also make that information available > through Spark SQL. So, is there any way to write user defined > functions for Spark SQL? I know that a HiveContext is available, but I > understand that that is for querying data from Hive, and I don't have > Hive in my stack (please correct me if I'm wrong). > > I would love to be able to do something like the following: > > val casRdd = sparkCtx.cassandraTable("ks", "cf") > > // registerAsTable etc > > val res = sql("SELECT id, xmlGetTag(xmlfield, 'sometag') FROM cf") > > -- > Best regards, > Martin Gammelsæter > -- Takuya UESHIN Tokyo, Japan http://twitter.com/ueshin