Start looking at the Spark/Cassandra connector here (in Scala): https://github.com/datastax/spark-cassandra-connector/tree/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector
Data locality is provided by this method: https://github.com/datastax/spark-cassandra-connector/blob/master/spark-cassandra-connector/src/main/scala/com/datastax/spark/connector/rdd/CassandraRDD.scala#L329-L336 Start digging from this all the way down the code. As for Stratio Deep, I can't tell how the did the integration with Spark. Take some time to dig down their code to understand the logic. On Wed, Feb 11, 2015 at 2:25 PM, Marcelo Valle (BLOOMBERG/ LONDON) < mvallemil...@bloomberg.net> wrote: > Taking the opportunity Spark was being discussed in another thread, I > decided to start a new one as I have interest in using Spark + Cassandra in > the feature. > > About 3 years ago, Spark was not an existing option and we tried to use > hadoop to process Cassandra data. My experience was horrible and we reached > the conclusion it was faster to develop an internal tool than insist on > Hadoop _for our specific case_. > > How I can see Spark is starting to be known as a "better hadoop" and it > seems market is going this way now. I can also see I have many more options > to decide how to integrate Cassandra using the Spark RDD concept than using > the ColumnFamilyInputFormat. > > I have found this java driver made by Datastax: > https://github.com/datastax/spark-cassandra-connector > > I also have found python Cassandra support on spark's repo, but it seems > experimental yet: > https://github.com/apache/spark/tree/master/examples/src/main/python > > Finally I have found stratio deep: https://github.com/Stratio/deep-spark > It seems Stratio guys have forked Cassandra also, I am still a little > confused about it. > > Question: which driver should I use, if I want to use Java? And which if I > want to use python? > I think the way Spark can integrate to Cassandra makes all the difference > in the world, from my past experience, so I would like to know more about > it, but I don't even know which source code I should start looking... > I would like to integrate using python and or C++, but I wonder if it > doesn't pay the way to use the java driver instead. > > Thanks in advance > > > >