I rsync the spark-1.0.1 directory to all the nodes. Yep, one needs Spark in
all the nodes irrespective of Hadoop/YARN.
Cheers
On Tue, Jul 8, 2014 at 6:24 PM, Robert James wrote:
> I have a Spark app which runs well on local master. I'm now ready to
> put it on a cluster. What needs to be ins
Hi Robert,
If you're running Spark against YARN, you don't need to install anything
Spark-specific on all the nodes. For each application, the client will
copy the Spark jar to HDFS where the Spark processes can fetch it. For
faster app startup, you can copy the Spark jar to a public location on
You can use the spark-ec2/bdutil scripts to set it up on the AWS/GCE cloud
quickly.
If you want to set it up on your own then these are the things that you
will need to do:
1. Make sure you have java (7) installed on all machines.
2. Install and configure spark (add all slave nodes in conf/slaves
I have a Spark app which runs well on local master. I'm now ready to
put it on a cluster. What needs to be installed on the master? What
needs to be installed on the workers?
If the cluster already has Hadoop or YARN or Cloudera, does it still
need an install of Spark?