You can run spark against your Cassandra data directly without using a shared filesystem.
https://github.com/datastax/spark-cassandra-connector On Fri, Oct 9, 2015 at 6:09 AM Marcelo Valle (BLOOMBERG/ LONDON) < mvallemil...@bloomberg.net> wrote: > Hello, > > I saw this nice link from an event: > > > http://www.datastax.com/dev/blog/zen-art-spark-maintenance?mkt_tok=3RkMMJWWfF9wsRogvqzIZKXonjHpfsX56%2B8uX6GylMI%2F0ER3fOvrPUfGjI4GTcdmI%2BSLDwEYGJlv6SgFSrXMMblswLgIXBY%3D > > I would like to test using Spark to perform some operations on a column > family, my objective is reading from CF A and writing the output of my M/R > job to CF B. > > That said, I've read this from Spark's FAQ ( > http://spark.apache.org/faq.html): > > "Do I need Hadoop to run Spark? > No, but if you run on a cluster, you will need some form of shared file > system (for example, NFS mounted at the same path on each node). If you > have this type of filesystem, you can just deploy Spark in standalone mode. > " > > The question I ask is - if I don't want to have a HDFS instalation just to > run Spark on Cassandra, is my only option to have this NFS mounted over > network? > It doesn't seem smart to me to have something as NFS to store Spark files, > as it would probably affect performance, and at the same time I wouldn't > like to have an additional HDFS cluster just to run jobs on Cassandra. > Is there a way of using Cassandra itself as this "some form of shared > file system"? > > -Marcelo > > > << ideas don't deserve respect >> >