I am using Calliope cassandra-spark connector( http://tuplejump.github.io/calliope/), which is quite handy and easy to use! The only problem is that it is a bit outdates , works with Spark 1.1.0, hopefully new version comes soon.
best, /Shahab On Wed, Feb 11, 2015 at 2:51 PM, Marcelo Valle (BLOOMBERG/ LONDON) < mvallemil...@bloomberg.net> wrote: > I just finished a scala course, nice exercise to check what I learned :D > > Thanks for the answer! > > From: user@cassandra.apache.org > Subject: Re: best supported spark connector for Cassandra > > Start looking at the Spark/Cassandra connector here (in Scala): > https://github.com/datastax/spark-cassandra-connector/tree/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector > > Data locality is provided by this method: > https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/CassandraRDD.scala#L329-L336 > > Start digging from this all the way down the code. > > As for Stratio Deep, I can't tell how the did the integration with Spark. > Take some time to dig down their code to understand the logic. > > > > On Wed, Feb 11, 2015 at 2:25 PM, Marcelo Valle (BLOOMBERG/ LONDON) < > mvallemil...@bloomberg.net> wrote: > >> Taking the opportunity Spark was being discussed in another thread, I >> decided to start a new one as I have interest in using Spark + Cassandra in >> the feature. >> >> About 3 years ago, Spark was not an existing option and we tried to use >> hadoop to process Cassandra data. My experience was horrible and we reached >> the conclusion it was faster to develop an internal tool than insist on >> Hadoop _for our specific case_. >> >> How I can see Spark is starting to be known as a "better hadoop" and it >> seems market is going this way now. I can also see I have many more options >> to decide how to integrate Cassandra using the Spark RDD concept than using >> the ColumnFamilyInputFormat. >> >> I have found this java driver made by Datastax: >> https://github.com/datastax/spark-cassandra-connector >> >> I also have found python Cassandra support on spark's repo, but it seems >> experimental yet: >> https://github.com/apache/spark/tree/master/examples/src/main/python >> >> Finally I have found stratio deep: https://github.com/Stratio/deep-spark >> It seems Stratio guys have forked Cassandra also, I am still a little >> confused about it. >> >> Question: which driver should I use, if I want to use Java? And which if >> I want to use python? >> I think the way Spark can integrate to Cassandra makes all the difference >> in the world, from my past experience, so I would like to know more about >> it, but I don't even know which source code I should start looking... >> I would like to integrate using python and or C++, but I wonder if it >> doesn't pay the way to use the java driver instead. >> >> Thanks in advance >> >> >> >> > >