Re: Spark and intermediate results

Jonathan Haddad Fri, 09 Oct 2015 07:35:54 -0700

You can run spark against your Cassandra data directly without using a
shared filesystem.


https://github.com/datastax/spark-cassandra-connector


On Fri, Oct 9, 2015 at 6:09 AM Marcelo Valle (BLOOMBERG/ LONDON) <
mvallemil...@bloomberg.net> wrote:

> Hello,
>
> I saw this nice link from an event:
>
>
> http://www.datastax.com/dev/blog/zen-art-spark-maintenance?mkt_tok=3RkMMJWWfF9wsRogvqzIZKXonjHpfsX56%2B8uX6GylMI%2F0ER3fOvrPUfGjI4GTcdmI%2BSLDwEYGJlv6SgFSrXMMblswLgIXBY%3D
>
> I would like to test using Spark to perform some operations on a column
> family, my objective is reading from CF A and writing the output of my M/R
> job to CF B.
>
> That said, I've read this from Spark's FAQ (
> http://spark.apache.org/faq.html):
>
> "Do I need Hadoop to run Spark?
> No, but if you run on a cluster, you will need some form of shared file
> system (for example, NFS mounted at the same path on each node). If you
> have this type of filesystem, you can just deploy Spark in standalone mode.
> "
>
> The question I ask is - if I don't want to have a HDFS instalation just to
> run Spark on Cassandra, is my only option to have this NFS mounted over
> network?
> It doesn't seem smart to me to have something as NFS to store Spark files,
> as it would probably affect performance, and at the same time I wouldn't
> like to have an additional HDFS cluster just to run jobs on Cassandra.
> Is there a way of using Cassandra itself as this "some form of shared
> file system"?
>
> -Marcelo
>
>
> << ideas don't deserve respect >>
>

Re: Spark and intermediate results

Reply via email to